ProfessorsProfessors of University of RochesterZhiyao Duan

Zhiyao Duan

University of Rochester

H-index: 30

North America-United States

About Zhiyao Duan

Zhiyao Duan, With an exceptional h-index of 30 and a recent h-index of 26 (since 2020), a distinguished researcher at University of Rochester, specializes in the field of Computer Audition, Music Information Retrieval, Audio-Visual Processing, Machine Learning.

His recent articles reflect a diverse array of research interests and contributions to the field:

SynthTab: Leveraging Synthesized Data for Guitar Tablature Transcription

MusicHiFi: Fast High-Fidelity Stereo Vocoding

Toward Fully Self-Supervised Multi-Pitch Estimation

Cacophony: An Improved Contrastive Audio-Text Model

Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech

Singfake: Singing voice deepfake detection

Generalizing Voice Presentation Attack Detection to Unseen Synthetic Attacks and Channel Variation

SingNet: a real-time Singing Voice beat and Downbeat Tracking System

Zhiyao Duan Information

University	University of Rochester
Position	Electrical and Computer Engineering
Citations(all)	4181
Citations(since 2020)	3286
Cited By	1832
hIndex(all)	30
hIndex(since 2020)	26
i10Index(all)	73
i10Index(since 2020)	59
Email	Access Email
University Profile Page	University of Rochester
Google Scholar	View Google Scholar Profile

Zhiyao Duan Skills & Research Interests

Computer Audition

Music Information Retrieval

Audio-Visual Processing

Machine Learning

Top articles of Zhiyao Duan

Title	Journal	Author(s)	Publication Date
SynthTab: Leveraging Synthesized Data for Guitar Tablature Transcription		Yongyi Zang Yi Zhong Frank Cwitkowitz Zhiyao Duan	2024/4/14
MusicHiFi: Fast High-Fidelity Stereo Vocoding	arXiv preprint arXiv:2403.10493	Ge Zhu Juan-Pablo Caceres Zhiyao Duan Nicholas J Bryan	2024/3/15
Toward Fully Self-Supervised Multi-Pitch Estimation	arXiv preprint arXiv:2402.15569	Frank Cwitkowitz Zhiyao Duan	2024/2/23
Cacophony: An Improved Contrastive Audio-Text Model	arXiv preprint arXiv:2402.06986	Ge Zhu Zhiyao Duan	2024/2/10
Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech	arXiv preprint arXiv:2311.14816	Enting Zhou You Zhang Zhiyao Duan	2023/11/24
Singfake: Singing voice deepfake detection		Yongyi Zang You Zhang Mojtaba Heydari Zhiyao Duan	2024/4/14
Generalizing Voice Presentation Attack Detection to Unseen Synthetic Attacks and Channel Variation		You Zhang Fei Jiang Ge Zhu Xinhui Chen Zhiyao Duan	2023/2/24
SingNet: a real-time Singing Voice beat and Downbeat Tracking System		Mojtaba Heydari Ju-Chiang Wang Zhiyao Duan	2023/6/4
Euterpe: A Web Framework for Interactive Music Systems	Journal of the Audio Engineering Society	Yongyi Zang Christodoulos Benetatos Zhiyao Duan	2023/11/16
Transcription free filler word detection with Neural semi-CRFs		Ge Zhu Yujia Yan Juan-Pablo Caceres Zhiyao Duan	2023/6/4
EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis	arXiv preprint arXiv:2311.08667	Ge Zhu Yutong Wen Marc-André Carbonneau Zhiyao Duan	2023/11/15
SAMO: Speaker Attractor Multi-Center One-Class Learning For Voice Anti-Spoofing		Siwen Ding You Zhang Zhiyao Duan	2023/6/4
Harmonic Analysis With Neural Semi-CRF		Qiaoyu Yang Frank Cwitkowitz Zhiyao Duan	2023/11/5
HRTF Field: Unifying Measured HRTF Magnitude Representation with Neural Fields		You Zhang Yuxiang Wang Zhiyao Duan	2023/6/4
Mitigating Cross-Database Differences for Learning Unified HRTF Representation		Yutong Wen You Zhang Zhiyao Duan	2023/10/22
Grid-agnostic personalized head-related transfer function modeling with neural fields	The Journal of the Acoustical Society of America	You Zhang Yuxiang Wang Mark Bocko Zhiyao Duan	2023/3/1
Phase perturbation improves channel robustness for speech spoofing countermeasures	arXiv preprint arXiv:2306.03389	Yongyi Zang You Zhang Zhiyao Duan	2023/6/6
Editorial for TISMIR Special Collection: Cultural Diversity in MIR Research		Zhiyao Duan Peter van Kranenburg Juhan Nam Preeti Rao	2023/12/13
ControlVC: Zero-shot voice conversion with time-varying controls on pitch and speed	arXiv preprint arXiv:2209.11866	Meiying Chen Zhiyao Duan	2022/9/23
A study of the robustness of raw waveform based speaker embeddings under mismatched conditions		Ge Zhu Frank Cwitkowitz Zhiyao Duan	2022/5/23