ProfessorsProfessors of Carnegie Mellon UniversityAlex Hauptmann

Alex Hauptmann

Carnegie Mellon University

H-index: 93

North America-United States

About Alex Hauptmann

Alex Hauptmann, With an exceptional h-index of 93 and a recent h-index of 59 (since 2020), a distinguished researcher at Carnegie Mellon University, specializes in the field of Multimedia.

His recent articles reflect a diverse array of research interests and contributions to the field:

Spae: Semantic pyramid autoencoder for multimodal generation with frozen llms

MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis

Hyperbolic vs Euclidean Embeddings in Few-Shot Learning: Two Sides of the Same Coin

PhISANet: Phonetically Informed Speech Animation Network

Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward

Adversarially Masked Video Consistency for Unsupervised Domain Adaptation

Breaking the limits of text-conditioned 3d motion synthesis with elaborative descriptions

Towards open-domain twitter user profile inference

Alex Hauptmann Information

University	Carnegie Mellon University
Position	___
Citations(all)	34449
Citations(since 2020)	14801
Cited By	26256
hIndex(all)	93
hIndex(since 2020)	59
i10Index(all)	364
i10Index(since 2020)	185
Email	Access Email
University Profile Page	Carnegie Mellon University
Google Scholar	View Google Scholar Profile

Alex Hauptmann Skills & Research Interests

Multimedia

Top articles of Alex Hauptmann

Title	Journal	Author(s)	Publication Date
Spae: Semantic pyramid autoencoder for multimodal generation with frozen llms		Lijun Yu Yong Cheng Zhiruo Wang Vivek Kumar Wolfgang Macherey ...	2023
MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis	arXiv preprint arXiv:2404.18398	Xiang Li Zhi-Qi Cheng Jun-Yan He Xiaojiang Peng Alexander G Hauptmann	2024/4/29
Hyperbolic vs Euclidean Embeddings in Few-Shot Learning: Two Sides of the Same Coin		Gabriel Moreira Manuel Marques João Paulo Costeira Alexander Hauptmann	2024
PhISANet: Phonetically Informed Speech Animation Network		Salvador Medina Sarah L Taylor Carsten Stoll Gareth Edwards Alex Hauptmann ...	2024/4/14
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward	arXiv preprint arXiv:2404.01258	Ruohong Zhang Liangke Gui Zhiqing Sun Yihao Feng Keyang Xu ...	2024/4/1
Adversarially Masked Video Consistency for Unsupervised Domain Adaptation	arXiv preprint arXiv:2403.16242	Xiaoyu Zhu Junwei Liang Po-Yao Huang Alex Hauptmann	2024/3/24
Breaking the limits of text-conditioned 3d motion synthesis with elaborative descriptions		Yijun Qian Jack Urbanek Alexander G Hauptmann Jungdam Won	2023
Towards open-domain twitter user profile inference		Haoyang Wen Zhenxin Xiao Eduard Hovy Alexander G Hauptmann	2023/7
Chartreader: A unified framework for chart derendering and comprehension without heuristic rules		Zhi-Qi Cheng Qi Dai Alexander G Hauptmann	2023
Robust Automatic Detection of Traffic Activity		Alexander Hauptmann Lijun Yu Wenhe Liu Yijun Qian Zhiqi Cheng ...	2023/6/30
Magvit: Masked generative video transformer		Lijun Yu Yong Cheng Kihyuk Sohn José Lezama Han Zhang ...	2023
Document Entity Retrieval with Massive and Noisy Pre-training	arXiv preprint arXiv:2306.08937	Lijun Yu Jin Miao Xiaoyu Sun Jiayi Chen Alexander G Hauptmann ...	2023/6/15
Documentnet: Bridging the data gap in document pre-training		Lijun Yu Jin Miao Xiaoyu Sun Jiayi Chen Alexander G Hauptmann ...	2023/12
Stmt: A spatial-temporal mesh transformer for mocap-based action recognition	CVPR 2023	Xiaoyu Zhu Po-Yao Huang Junwei Liang Celso M de Melo Alexander Hauptmann	2023/3/31
Leveraging body pose estimation for gesture recognition in human-robot interaction using synthetic data		Xiaoyu Zhu Celso M de Melo Alexander Hauptmann	2023/6/13
Language Model Beats Diffusion--Tokenizer is Key to Visual Generation		Lijun Yu José Lezama Nitesh B. Gundavarapu Luca Versari Kihyuk Sohn ...	2024
Data processing system for classifying keyed data representing inhaler device operation			2023/6/13
Zero-shot and few-shot stance detection on varied topics via conditional generation		Haoyang Wen Alexander G Hauptmann	2023/7
Vehicle and Pedestrian Trajectory and Gap Estimation for Traffic Conflict Prediction		Alexander Hauptmann	2022/2/1
Gsrformer: Grounded situation recognition transformer with alternate semantic attention refinement		Zhi-Qi Cheng Qi Dai Siyao Li Teruko Mitamura Alexander Hauptmann	2022/10/10