Yong Man Ro

KAIST

H-index: 52

Asia-South Korea

About Yong Man Ro

Yong Man Ro, With an exceptional h-index of 52 and a recent h-index of 35 (since 2020), a distinguished researcher at KAIST, specializes in the field of Multimodal learning, Vision Language integration, Image processing and Computer vision.

His recent articles reflect a diverse array of research interests and contributions to the field:

Akvsr: Audio knowledge empowered visual speech recognition by compressing audio knowledge of a pretrained model

Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection

Integrating Language-Derived Appearance Elements with Visual Cues in Pedestrian Detection

Robust pedestrian detection via constructing versatile pedestrian knowledge bank

TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages

Improving Open Set Recognition via Visual Prompts Distilled from Common-Sense Knowledge

Text-Driven Talking Face Synthesis by Reprogramming Audio-Driven Models

Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing

Yong Man Ro Information

University	KAIST
Position	Professor of Electrical Engineering
Citations(all)	10490
Citations(since 2020)	4626
Cited By	7671
hIndex(all)	52
hIndex(since 2020)	35
i10Index(all)	233
i10Index(since 2020)	116
Email	Access Email
University Profile Page	KAIST
Google Scholar	View Google Scholar Profile

Yong Man Ro Skills & Research Interests

Multimodal learning

Vision Language integration

Image processing and Computer vision

Top articles of Yong Man Ro

Title	Journal	Author(s)	Publication Date
Akvsr: Audio knowledge empowered visual speech recognition by compressing audio knowledge of a pretrained model	IEEE Transactions on Multimedia	Jeong Hun Yeo Minsu Kim Jeongsoo Choi Dae Hoe Kim Yong Man Ro	2024/1/10
Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection	arXiv preprint arXiv:2403.01300	Taeheon Kim Sebin Shin Youngjoon Yu Hak Gu Kim Yong Man Ro	2024/3/2
Integrating Language-Derived Appearance Elements with Visual Cues in Pedestrian Detection	IEEE Transactions on Circuits and Systems for Video Technology	Sungjune Park Hyunjun Kim Yong Man Ro	2024/4/1
Robust pedestrian detection via constructing versatile pedestrian knowledge bank	Pattern Recognition	Sungjune Park Hyunjun Kim Yong Man Ro	2024/4/26
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages	arXiv preprint arXiv:2402.16021	Minsu Kim Jee-weon Jung Hyeongseop Rha Soumi Maiti Siddhant Arora ...	2024/2/25
Improving Open Set Recognition via Visual Prompts Distilled from Common-Sense Knowledge	Proceedings of the AAAI Conference on Artificial Intelligence	Seongyeop Kim Hyung-Il Kim Yong Man Ro	2024/3/24
Text-Driven Talking Face Synthesis by Reprogramming Audio-Driven Models		Jeongsoo Choi Minsu Kim Se Jin Park Yong Man Ro	2024/4/14
Where Visual Speech Meets Language: VSP-LLM Framework for Efficient and Context-Aware Visual Speech Processing	arXiv preprint arXiv:2402.15151	Jeong Hun Yeo Seunghee Han Minsu Kim Yong Man Ro	2024/2/23
MSCoTDet: Language-driven Multi-modal Fusion for Improved Multispectral Pedestrian Detection	arXiv preprint arXiv:2403.15209	Taeheon Kim Sangyun Chung Damin Yeom Youngjoon Yu Hak Gu Kim ...	2024/3/22
Visual Speech Recognition for Languages with Limited Labeled Data Using Automatic Labels from Whisper		Jeong Hun Yeo Minsu Kim Shinji Watanabe Yong Man Ro	2024/4/14
CoLLaVO: Crayon Large Language and Vision mOdel	arXiv preprint arXiv:2402.11248	Byung-Kwan Lee Beomchan Park Chae Won Kim Yong Man Ro	2024/2/17
What if...?: Counterfactual Inception to Mitigate Hallucination Effects in Large Multimodal Models	arXiv preprint arXiv:2403.13513	Junho Kim Yeon Ju Kim Yong Man Ro	2024/3/20
Persona Extraction Through Semantic Similarity for Emotional Support Conversation Generation		Seunghee Han Se Jin Park Chae Won Kim Yong Man Ro	2024/4/14
Multilingual visual speech recognition with a single model by learning with discrete visual speech units	arXiv preprint arXiv:2401.09802	Minsu Kim Jeong Hun Yeo Jeongsoo Choi Se Jin Park Yong Man Ro	2024/1/18
MoAI: Mixture of All Intelligence for Large Language and Vision Models	arXiv preprint arXiv:2403.07508	Byung-Kwan Lee Beomchan Park Chae Won Kim Yong Man Ro	2024/3/12
Towards practical and efficient image-to-speech captioning with vision-language pre-training and multi-modal tokens		Minsu Kim Jeongsoo Choi Soumi Maiti Jeong Hun Yeo Shinji Watanabe ...	2024/4/14
Causal unsupervised semantic segmentation	arXiv preprint arXiv:2310.07379	Junho Kim Byung-Kwan Lee Yong Man Ro	2023/10/11
Demystifying causal features on adversarial examples and causal inoculation for robust network by adversarial instrumental variable regression		Junho Kim Byung-Kwan Lee Yong Man Ro	2023/6/19
Meta input method and system and user-centered inference method and system via meta input for recycling of pretrained deep learning model			2023/6/22
Prompt tuning of deep neural networks for speaker-adaptive visual speech recognition	arXiv preprint arXiv:2302.08102	Minsu Kim Hyung-Il Kim Yong Man Ro	2023/2/16