ProfessorsProfessors of Tsinghua UniversityZhiyong WU (吴志勇)

Zhiyong WU (吴志勇)

Tsinghua University

H-index: 27

Asia-China

About Zhiyong WU (吴志勇)

Zhiyong WU (吴志勇), With an exceptional h-index of 27 and a recent h-index of 25 (since 2020), a distinguished researcher at Tsinghua University, specializes in the field of Speech synthesis, Deep learning.

His recent articles reflect a diverse array of research interests and contributions to the field:

Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction

Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts

Conversational Co-Speech Gesture Generation via Modeling Dialog Intention, Emotion, and Context with Diffusion Models

SCNet: Sparse Compression Network for Music Source Separation

Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model

Enhancing Expressiveness in Dance Generation Via Integrating Frequency and Music Style Information

Multi-view MidiVAE: Fusing Track-and Bar-view Representations for Long Multi-track Symbolic Music Generation

Explore 3D Dance Generation via Reward Model from Automatically-Ranked Demonstrations

Zhiyong WU (吴志勇) Information

University	Tsinghua University
Position	Associate Professor
Citations(all)	3100
Citations(since 2020)	2427
Cited By	1255
hIndex(all)	27
hIndex(since 2020)	25
i10Index(all)	90
i10Index(since 2020)	73
Email	Access Email
University Profile Page	Tsinghua University
Google Scholar	View Google Scholar Profile

Zhiyong WU (吴志勇) Skills & Research Interests

Speech synthesis

Deep learning

Top articles of Zhiyong WU (吴志勇)

Title	Journal	Author(s)	Publication Date
Exploiting Audio-Visual Features with Pretrained AV-HuBERT for Multi-Modal Dysarthric Speech Reconstruction	arXiv preprint arXiv:2401.17796	Xueyuan Chen Yuejiao Wang Xixin Wu Disong Wang Zhiyong Wu ...	2024/1/31
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts		Shun Lei Yixuan Zhou Liyang Chen Dan Luo Zhiyong Wu ...	2024/4/14
Conversational Co-Speech Gesture Generation via Modeling Dialog Intention, Emotion, and Context with Diffusion Models		Haiwei Xue Sicheng Yang Zhensong Zhang Zhiyong Wu Minglei Li ...	2024/4/14
SCNet: Sparse Compression Network for Music Source Separation	arXiv preprint arXiv:2401.13276	Weinan Tong Jiaxu Zhu Jun Chen Shiyin Kang Tao Jiang ...	2024/1/24
Co-Speech Gesture Video Generation via Motion-Decoupled Diffusion Model	arXiv preprint arXiv:2404.01862	Xu He Qiaochu Huang Zhensong Zhang Zhiwei Lin Zhiyong Wu ...	2024/4/2
Enhancing Expressiveness in Dance Generation Via Integrating Frequency and Music Style Information		Qiaochu Huang Xu He Boshi Tang Haolin Zhuang Liyang Chen ...	2024/4/14
Multi-view MidiVAE: Fusing Track-and Bar-view Representations for Long Multi-track Symbolic Music Generation	arXiv preprint arXiv:2401.07532	Zhiwei Lin Jun Chen Boshi Tang Binzhu Sha Jing Yang ...	2024/1/15
Explore 3D Dance Generation via Reward Model from Automatically-Ranked Demonstrations	Proceedings of the AAAI Conference on Artificial Intelligence	Zilin Wang Haolin Zhuang Lu Li Yinmin Zhang Junjie Zhong ...	2024/3/25
Consistent and Relevant: Rethink the Query Embedding in General Sound Separation		Yuanyuan Wang Hangting Chen Dongchao Yang Jianwei Yu Chao Weng ...	2024/4/14
SimCalib: Graph Neural Network Calibration based on Similarity between Nodes	Proceedings of the AAAI Conference on Artificial Intelligence	Boshi Tang Zhiyong Wu Xixin Wu Qiaochu Huang Jun Chen ...	2024/3/24
Freetalker: Controllable Speech and Text-Driven Gesture Generation Based on Diffusion Models for Enhanced Speaker Naturalness	arXiv preprint arXiv:2401.03476	Sicheng Yang Zunnan Xu Haiwei Xue Yongkang Cheng Shaoli Huang ...	2024/1/7
Neural Concatenative Singing Voice Conversion: Rethinking Concatenation-based Approach for One-shot Singing Voice Conversion		Binzhu Sha Xu Li Zhiyong Wu Ying Shan Helen Meng	2024/4/14
SECap: Speech Emotion Captioning with Large Language Model	Proceedings of the AAAI Conference on Artificial Intelligence	Yaoxun Xu Hangting Chen Jianwei Yu Qiaochu Huang Zhiyong Wu ...	2024/3/24
StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based Pre-training for Expressive Audiobook Speech Synthesis		Xueyuan Chen Xi Wang Shaofei Zhang Lei He Zhiyong Wu ...	2024/4/14
Inter-Subnet: Speech Enhancement with Subband Interaction		Jun Chen Wei Rao Zilin Wang Jiuxin Lin Zhiyong Wu ...	2023/6/4
WavSyncSwap: End-To-End Portrait-Customized Audio-Driven Talking Face Generation		Weihong Bao Liyang Chen Chaoyong Zhou Sicheng Yang Zhiyong Wu	2023/6/4
SnakeGAN: A Universal Vocoder Leveraging DDSP Prior Knowledge and Periodic Inductive Bias		Sipan Li Songxiang Liu Luwen Zhang Xiang Li Yanyao Bian ...	2023/7/10
Text-Only Domain Adaptation for End-to-End Speech Recognition through Down-Sampling Acoustic Representation		Jiaxu Zhu Weinan Tong Yaoxun Xu Changhe Song Zhiyong Wu ...	2023/8/22
Diverse and Expressive Speech Prosody Prediction with Denoising Diffusion Probabilistic Model		Xiang Li Songxiang Liu Max WY Lam Zhiyong Wu Chao Weng ...	2023
AdaMesh: Personalized Facial Expressions and Head Poses for Speech-Driven 3D Facial Animation	arXiv preprint arXiv:2310.07236	Liyang Chen Weihong Bao Shun Lei Boshi Tang Zhiyong Wu ...	2023/10/11