Yong Man Ro

Yong Man Ro

KAIST

H-index: 52

Asia-South Korea

About Yong Man Ro

Yong Man Ro, With an exceptional h-index of 52 and a recent h-index of 35 (since 2020), a distinguished researcher at KAIST, specializes in the field of Multimodal learning, Vision Language integration, Image processing and Computer vision.

His recent articles reflect a diverse array of research interests and contributions to the field:

Akvsr: Audio knowledge empowered visual speech recognition by compressing audio knowledge of a pretrained model

Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection

Integrating Language-Derived Appearance Elements with Visual Cues in Pedestrian Detection

Robust pedestrian detection via constructing versatile pedestrian knowledge bank

TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages

Improving Open Set Recognition via Visual Prompts Distilled from Common-Sense Knowledge

Text-Driven Talking Face Synthesis by Reprogramming Audio-Driven Models

Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing

Yong Man Ro Information

University

Position

Professor of Electrical Engineering

Citations(all)

10490

Citations(since 2020)

4626

Cited By

7671

hIndex(all)

52

hIndex(since 2020)

35

i10Index(all)

233

i10Index(since 2020)

116

Email

University Profile Page

Google Scholar

Yong Man Ro Skills & Research Interests

Multimodal learning

Vision Language integration

Image processing and Computer vision

Top articles of Yong Man Ro

Title

Journal

Author(s)

Publication Date

Akvsr: Audio knowledge empowered visual speech recognition by compressing audio knowledge of a pretrained model

IEEE Transactions on Multimedia

Jeong Hun Yeo

Minsu Kim

Jeongsoo Choi

Dae Hoe Kim

Yong Man Ro

2024/1/10

Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection

arXiv preprint arXiv:2403.01300

Taeheon Kim

Sebin Shin

Youngjoon Yu

Hak Gu Kim

Yong Man Ro

2024/3/2

Integrating Language-Derived Appearance Elements with Visual Cues in Pedestrian Detection

IEEE Transactions on Circuits and Systems for Video Technology

Sungjune Park

Hyunjun Kim

Yong Man Ro

2024/4/1

Robust pedestrian detection via constructing versatile pedestrian knowledge bank

Pattern Recognition

Sungjune Park

Hyunjun Kim

Yong Man Ro

2024/4/26

TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages

arXiv preprint arXiv:2402.16021

Minsu Kim

Jee-weon Jung

Hyeongseop Rha

Soumi Maiti

Siddhant Arora

...

2024/2/25

Improving Open Set Recognition via Visual Prompts Distilled from Common-Sense Knowledge

Proceedings of the AAAI Conference on Artificial Intelligence

Seongyeop Kim

Hyung-Il Kim

Yong Man Ro

2024/3/24

Text-Driven Talking Face Synthesis by Reprogramming Audio-Driven Models

Jeongsoo Choi

Minsu Kim

Se Jin Park

Yong Man Ro

2024/4/14

Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing

arXiv preprint arXiv:2402.15151

Jeong Hun Yeo

Seunghee Han

Minsu Kim

Yong Man Ro

2024/2/23

MSCoTDet: Language-driven Multi-modal Fusion for Improved Multispectral Pedestrian Detection

arXiv preprint arXiv:2403.15209

Taeheon Kim

Sangyun Chung

Damin Yeom

Youngjoon Yu

Hak Gu Kim

...

2024/3/22

Visual Speech Recognition for Languages with Limited Labeled Data Using Automatic Labels from Whisper

Jeong Hun Yeo

Minsu Kim

Shinji Watanabe

Yong Man Ro

2024/4/14

CoLLaVO: Crayon Large Language and Vision mOdel

arXiv preprint arXiv:2402.11248

Byung-Kwan Lee

Beomchan Park

Chae Won Kim

Yong Man Ro

2024/2/17

What if...?: Counterfactual Inception to Mitigate Hallucination Effects in Large Multimodal Models

arXiv preprint arXiv:2403.13513

Junho Kim

Yeon Ju Kim

Yong Man Ro

2024/3/20

Persona Extraction Through Semantic Similarity for Emotional Support Conversation Generation

Seunghee Han

Se Jin Park

Chae Won Kim

Yong Man Ro

2024/4/14

Multilingual visual speech recognition with a single model by learning with discrete visual speech units

arXiv preprint arXiv:2401.09802

Minsu Kim

Jeong Hun Yeo

Jeongsoo Choi

Se Jin Park

Yong Man Ro

2024/1/18

MoAI: Mixture of All Intelligence for Large Language and Vision Models

arXiv preprint arXiv:2403.07508

Byung-Kwan Lee

Beomchan Park

Chae Won Kim

Yong Man Ro

2024/3/12

Towards practical and efficient image-to-speech captioning with vision-language pre-training and multi-modal tokens

Minsu Kim

Jeongsoo Choi

Soumi Maiti

Jeong Hun Yeo

Shinji Watanabe

...

2024/4/14

Causal unsupervised semantic segmentation

arXiv preprint arXiv:2310.07379

Junho Kim

Byung-Kwan Lee

Yong Man Ro

2023/10/11

Demystifying causal features on adversarial examples and causal inoculation for robust network by adversarial instrumental variable regression

Junho Kim

Byung-Kwan Lee

Yong Man Ro

2023/6/19

Meta input method and system and user-centered inference method and system via meta input for recycling of pretrained deep learning model

2023/6/22

Prompt tuning of deep neural networks for speaker-adaptive visual speech recognition

arXiv preprint arXiv:2302.08102

Minsu Kim

Hyung-Il Kim

Yong Man Ro

2023/2/16

See List of Professors in Yong Man Ro University(KAIST)

Co-Authors

academic-engine