ProfessorsProfessors of Shanghai Jiao Tong UniversityKai Yu（俞凯）

Kai Yu（俞凯）

Shanghai Jiao Tong University

H-index: 49

Asia-China

About Kai Yu（俞凯）

Kai Yu（俞凯）, With an exceptional h-index of 49 and a recent h-index of 39 (since 2020), a distinguished researcher at Shanghai Jiao Tong University, specializes in the field of dialogue system, speech recognition, speech synthesis, natural language processing, machine learning.

His recent articles reflect a diverse array of research interests and contributions to the field:

ChemDFM: Dialogue Foundation Model for Chemistry

VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching

VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech

Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding

Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge Feedback

Label-Aware Auxiliary Learning for Dialogue State Tracking

Scieval: A multi-level large language model evaluation benchmark for scientific research

Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS

Kai Yu（俞凯） Information

University	Shanghai Jiao Tong University
Position	___
Citations(all)	8314
Citations(since 2020)	5084
Cited By	4910
hIndex(all)	49
hIndex(since 2020)	39
i10Index(all)	167
i10Index(since 2020)	127
Email	Access Email
University Profile Page	Shanghai Jiao Tong University
Google Scholar	View Google Scholar Profile

Kai Yu（俞凯） Skills & Research Interests

dialogue system

speech recognition

speech synthesis

natural language processing

machine learning

Top articles of Kai Yu（俞凯）

Title	Journal	Author(s)	Publication Date
ChemDFM: Dialogue Foundation Model for Chemistry	arXiv preprint arXiv:2401.14818	Zihan Zhao Da Ma Lu Chen Liangtai Sun Zihao Li ...	2024/1/26
VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching		Yiwei Guo Chenpeng Du Ziyang Ma Xie Chen Kai Yu	2024/4/14
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech	arXiv preprint arXiv:2401.14321	Chenpeng Du Yiwei Guo Hankun Wang Yifan Yang Zhikang Niu ...	2024/1/25
Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding		Hongshen Xu Lu Chen Zihan Zhao Da Ma Ruisheng Cao ...	2024/3/4
Rejection Improves Reliability: Training LLMs to Refuse Unknown Questions Using RL from Knowledge Feedback	arXiv preprint arXiv:2403.18349	Hongshen Xu Zichen Zhu Da Ma Situo Zhang Shuai Fan ...	2024/3/27
Label-Aware Auxiliary Learning for Dialogue State Tracking		Yuncong Liu Lu Chen Kai Yu	2024/4/14
Scieval: A multi-level large language model evaluation benchmark for scientific research	Proceedings of the AAAI Conference on Artificial Intelligence	Liangtai Sun Yang Han Zihan Zhao Da Ma Zhennan Shen ...	2024/3/24
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS		Yifan Yang Feiyu Shen Chenpeng Du Ziyang Ma Kai Yu ...	2024/4/14
Large Language Models Are Semi-Parametric Reinforcement Learning Agents	Advances in Neural Information Processing Systems	Danyang Zhang Lu Chen Situo Zhang Hongshen Xu Zihan Zhao ...	2024/2/13
Acoustic bpe for speech generation with discrete tokens		Feiyu Shen Yiwei Guo Chenpeng Du Xie Chen Kai Yu	2024/4/14
UniCATS: A unified context-aware text-to-speech framework with contextual vq-diffusion and vocoding	Proceedings of the AAAI Conference on Artificial Intelligence	Chenpeng Du Yiwei Guo Feiyu Shen Zhijun Liu Zheng Liang ...	2024/3/24
Multi: Multimodal Understanding Leaderboard with Text and Images	arXiv preprint arXiv:2402.03173	Zichen Zhu Yang Xu Lu Chen Jingkai Yang Yichuan Ma ...	2024/2/5
SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention		Junjie Li Yiwei Guo Xie Chen Kai Yu	2024/4/14
StoryTTS: A Highly Expressive Text-to-Speech Dataset with Rich Textual Expressiveness Annotations		Sen Liu Yiwei Guo Xie Chen Kai Yu	2024/4/14
DIR: A Large-Scale Dialogue Rewrite Dataset for Cross-Domain Conversational Text-to-SQL	Applied Sciences	Jieyu Li Zhi Chen Lu Chen Zichen Zhu Hanqi Li ...	2023/2/9
Enhance Temporal Relations in Audio Captioning with Sound Event Detection	arXiv preprint arXiv:2306.01533	Zeyu Xie Xuenan Xu Mengyue Wu Kai Yu	2023/6/2
Multi-Speaker End-to-End Multi-Modal Speaker Diarization System for the MISP 2022 Challenge		Tao Liu Zhengyang Chen Yanmin Qian Kai Yu	2023/6/4
Speaker Adaptive Text-to-Speech with Timbre-Normalized Vector-Quantized Feature	IEEE/ACM Transactions on Audio, Speech, and Language Processing	Chenpeng Du Yiwei Guo Xie Chen Kai Yu	2023/8/24
Iterative Noisy-Target Approach: Speech Enhancement Without Clean Speech		Yifan Zhang Wenbin Jiang Qing Zhuo Kai Yu	2023/12/8
On the Structural Generalization in Text-to-SQL	arXiv preprint arXiv:2301.04790	Jieyu Li Lu Chen Ruisheng Cao Su Zhu Hongshen Xu ...	2023/1/12