Po-Yao (Bernie) Huang
Carnegie Mellon University
H-index: 23
North America-United States
Top articles of Po-Yao (Bernie) Huang
Title | Journal | Author(s) | Publication Date |
---|---|---|---|
Adversarially Masked Video Consistency for Unsupervised Domain Adaptation | arXiv preprint arXiv:2403.16242 | Xiaoyu Zhu Junwei Liang Po-Yao Huang Alex Hauptmann | 2024/3/24 |
MoDE: CLIP Data Experts via Clustering | arXiv preprint arXiv:2404.16030 | Jiawei Ma Po-Yao Huang Saining Xie Shang-Wen Li Luke Zettlemoyer | 2024/4/24 |
Av-superb: A multi-task evaluation benchmark for audio-visual representation models | Yuan Tseng Layne Berry Yi-Ting Chen I-Hsiang Chiu Hsuan-Hao Lin | 2024/4/14 | |
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild | arXiv preprint arXiv:2403.16973 | Puyuan Peng Po-Yao Huang Daniel Li Abdelrahman Mohamed David Harwath | 2024/3/25 |
SeamlessM4T-Massively Multilingual & Multimodal Machine Translation | arXiv preprint arXiv:2308.11596 | Loïc Barrault Yu-An Chung Mariano Cora Meglioli David Dale Ning Dong | 2023/8/22 |
Diffusion Models as Masked Autoencoders | ICCV | Chen Wei Karttikeya Mangalam Po-Yao Huang Yanghao Li Haoqi Fan | 2023 |
Generating Hashtags for Short-form Videos with Guided Signals | Tiezheng Yu Hanchao Yu Davis Liang Yuning Mao Shaoliang Nie | 2023/7 | |
Cit: Curation in training for effective vision-language data | Hu Xu Saining Xie Po-Yao Huang Licheng Yu Russell Howes | 2023 | |
STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition | CVPR 2023 | Xiaoyu Zhu Po-Yao Huang Junwei Liang Celso M de Melo Alexander Hauptmann | 2023/3/31 |
Data processing system for classifying keyed data representing inhaler device operation | 2023/6/13 | ||
Flap: Fast language-audio pre-training | Ching-Feng Yeh Po-Yao Huang Vasu Sharma Shang-Wen Li Gargi Gosh | 2023/12/16 | |
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles | ICML | Chaitanya Ryali Yuan-Ting Hu Daniel Bolya Chen Wei Haoqi Fan | 2023 |
Demystifying clip data | arXiv preprint arXiv:2309.16671 | Hu Xu Saining Xie Xiaoqing Ellen Tan Po-Yao Huang Russell Howes | 2023/9/28 |
Dinov2: Learning robust visual features without supervision | arXiv preprint arXiv:2304.07193 | Maxime Oquab Timothée Darcet Théo Moutakanni Huy Vo Marc Szafraniec | 2023/4/14 |
Video pivoting unsupervised multi-modal machine translation | IEEE Transactions on Pattern Analysis and Machine Intelligence | Mingjie Li Po-Yao Huang Xiaojun Chang Junjie Hu Yi Yang | 2022/6/9 |
On adversarial robustness of large-scale audio visual learning | Juncheng B Li Shuhui Qu Xinjian Li Po-Yao Bernie Huang Florian Metze | 2022/5/23 | |
AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification | arXiv preprint arXiv:2203.13448 | Juncheng B Li Shuhui Qu Po-Yao Huang Florian Metze | 2022/3/25 |
MAViL: Masked Audio-Video Learners | Advances in Neural Information Processing Systems | Po-Yao Huang Vasu Sharma Hu Xu Chaitanya Ryali Yanghao Li | 2024/2/13 |
Cm3: A causal masked multimodal model of the internet | arXiv preprint arXiv:2201.07520 | Armen Aghajanyan Bernie Huang Candace Ross Vladimir Karpukhin Hu Xu | 2022/1/19 |
Masked autoencoders that listen | Advances in Neural Information Processing Systems | Po-Yao Huang Hu Xu Juncheng Li Alexei Baevski Michael Auli | 2022/12/6 |