ProfessorsProfessors of Carnegie Mellon UniversityXuankai Chang

Xuankai Chang

Carnegie Mellon University

H-index: 23

North America-United States

About Xuankai Chang

Xuankai Chang, With an exceptional h-index of 23 and a recent h-index of 22 (since 2020), a distinguished researcher at Carnegie Mellon University, specializes in the field of Automatic Speech Recognition, Acoustic Models.

His recent articles reflect a diverse array of research interests and contributions to the field:

Hypothesis stitcher for speech recognition of long-form audio

Improving audio captioning models with fine-grained audio features, text embedding supervision, and llm mix-up augmentation

TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages

Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing

OWSM v3. 1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer

Exploring speech recognition, translation, and understanding with discrete speech units: A comparative study

A Large-Scale Evaluation of Speech Foundation Models

VoxtLM: Unified Decoder-Only Models for Consolidating Speech Recognition, Synthesis and Speech, Text Continuation Tasks

Xuankai Chang Information

University	Carnegie Mellon University
Position	Student
Citations(all)	2558
Citations(since 2020)	2506
Cited By	379
hIndex(all)	23
hIndex(since 2020)	22
i10Index(all)	39
i10Index(since 2020)	38
Email	Access Email
University Profile Page	Carnegie Mellon University
Google Scholar	View Google Scholar Profile

Xuankai Chang Skills & Research Interests

Automatic Speech Recognition

Acoustic Models

Top articles of Xuankai Chang

Title	Journal	Author(s)	Publication Date
Hypothesis stitcher for speech recognition of long-form audio			2024/3/19
Improving audio captioning models with fine-grained audio features, text embedding supervision, and llm mix-up augmentation	IEEE ICASSP 2024	Shih-Lun Wu Xuankai Chang Gordon Wichern Jee-weon Jung François Germain ...	2024/4/14
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages	arXiv preprint arXiv:2402.16021	Minsu Kim Jee-weon Jung Hyeongseop Rha Soumi Maiti Siddhant Arora ...	2024/2/25
Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing		Brian Yan Xuankai Chang Antonios Anastasopoulos Yuya Fujita Shinji Watanabe	2024/4/14
OWSM v3. 1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer	arXiv preprint arXiv:2401.16658	Yifan Peng Jinchuan Tian William Chen Siddhant Arora Brian Yan ...	2024/1/30
Exploring speech recognition, translation, and understanding with discrete speech units: A comparative study		Xuankai Chang Brian Yan Kwanghee Choi Jee-Weon Jung Yichen Lu ...	2024/4/14
A Large-Scale Evaluation of Speech Foundation Models	IEEE/ACM Transactions on Audio, Speech, and Language Processing	Shu-wen Yang Heng-Jui Chang Zili Huang Andy T Liu Cheng-I Lai ...	2024/4/16
VoxtLM: Unified Decoder-Only Models for Consolidating Speech Recognition, Synthesis and Speech, Text Continuation Tasks	IEEE ICASSP 2024	Soumi Maiti Yifan Peng Shukjae Choi Jee-weon Jung Xuankai Chang ...	2024/4/14
Audiogpt: Understanding and generating speech, music, sound, and talking head	Proceedings of the AAAI Conference on Artificial Intelligence	Rongjie Huang Mingze Li Dongchao Yang Jiatong Shi Xuankai Chang ...	2024/3/24
Hubertopic: Enhancing Semantic Representation of Hubert Through Self-Supervision Utilizing Topic Model		Takashi Maekaku Jiatong Shi Xuankai Chang Yuya Fujita Shinji Watanabe	2024/4/14
A study on the integration of pre-trained ssl, asr, lm and slu models for spoken language understanding		Yifan Peng Siddhant Arora Yosuke Higuchi Yushi Ueda Sujay Kumar ...	2023/1/9
Exploration of efficient end-to-end asr using discretized input from self-supervised learning	arXiv preprint arXiv:2305.18108	Xuankai Chang Brian Yan Yuya Fujita Takashi Maekaku Shinji Watanabe	2023/5/29
Tokensplit: Using discrete speech representations for direct, refined, and transcript-conditioned speech separation and recognition	arXiv preprint arXiv:2308.10415	Hakan Erdogan Scott Wisdom Xuankai Chang Zalán Borsos Marco Tagliasacchi ...	2023/8/21
Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning	arXiv preprint arXiv:2309.15317	William Chen Jiatong Shi Brian Yan Dan Berrebbi Wangyou Zhang ...	2023/9/26
End-to-end integration of speech recognition, dereverberation, beamforming, and self-supervised learning representation		Yoshiki Masuyama Xuankai Chang Samuele Cornell Shinji Watanabe Nobutaka Ono	2023/1/9
A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning	arXiv preprint arXiv:2305.13331	Jiyang Tang William Chen Xuankai Chang Shinji Watanabe Brian MacWhinney	2023/5/19
Reproducing whisper-style training using an open-source toolkit and publicly available data		Yifan Peng Jinchuan Tian Brian Yan Dan Berrebbi Xuankai Chang ...	2023/12/16
The chime-7 dasr challenge: Distant meeting transcription with multiple devices in diverse scenarios	arXiv preprint arXiv:2306.13734	Samuele Cornell Matthew Wiesner Shinji Watanabe Desh Raj Xuankai Chang ...	2023/6/23
Superb@ slt 2022: Challenge on generalization and efficiency of self-supervised speech representation learning		Tzu-hsun Feng Annie Dong Ching-Feng Yeh Shu-wen Yang Tzu-Quan Lin ...	2023/1/9
ML-SUPERB: Multilingual speech universal performance benchmark		Jiatong Shi Dan Berrebbi William Chen Ho-Lam Chung En-Pei Hu ...	2023