Zhuowen Tu
University of California, San Diego
H-index: 73
North America-United States
Top articles of Zhuowen Tu
Title | Journal | Author(s) | Publication Date |
---|---|---|---|
Enhancing Vision-Language Pre-training with Rich Supervisions | arXiv preprint arXiv:2403.03346 | Yuan Gao Kunyu Shi Pengkai Zhu Edouard Belval Oren Nuriel | 2024/3/5 |
Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model | arXiv preprint arXiv:2404.18065 | Xiaolong Li Jiawei Mo Ying Wang Chethan Parameshwara Xiaohan Fei | 2024/4/28 |
Non-autoregressive Sequence-to-Sequence Vision-Language Models | CVPR | Kunyu Shi Qi Dong Luis Goncalves Zhuowen Tu Stefano Soatto | 2024/3/4 |
Elodi: Ensemble logit difference inhibition for positive-congruent training | IEEE Transactions on Pattern Analysis and Machine Intelligence | Yue Zhao Yantao Shen Yuanjun Xiong Shuo Yang Wei Xia | 2024/4/23 |
Affordancellm: Grounding affordance from vision language models | arXiv preprint arXiv:2401.06341 | Shengyi Qian Weifeng Chen Min Bai Xiong Zhou Zhuowen Tu | 2024/1/12 |
On the Scalability of Diffusion-based Text-to-Image Generation | arXiv preprint arXiv:2404.02883 | Hao Li Yang Zou Ying Wang Orchid Majumder Yusheng Xie | 2024/4/3 |
HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data | CVPR | Mengqi Zhang Yang Fu Zheng Ding Sifei Liu Zhuowen Tu | 2024/3/18 |
Bayesian Diffusion Models for 3D Shape Reconstruction | CVPR | Haiyang Xu Yu Lei Zeyuan Chen Xiang Zhang Yue Zhao | 2024/3/11 |
DocTr: Document Transformer for Structured Information Extraction in Documents | ICCV-23 | Haofu Liao Aruni RoyChowdhury Weijian Li Ankan Bansal Yuting Zhang | 2023/7/16 |
Uni-3D: A Universal Model for Panoptic 3D Scene Reconstruction | Xiang Zhang Zeyuan Chen Fangyin Wei Zhuowen Tu | 2023 | |
When is multilinguality a curse? language modeling for 250 high-and low-resource languages | arXiv preprint arXiv:2311.09205 | Tyler A Chang Catherine Arnett Zhuowen Tu Benjamin K Bergen | 2023/11/15 |
Musketeer: Joint Training/Inference for Multi-task Vision-Language Model with Task Explanation Prompts | Zhaoyang Zhang Yantao Shen Kunyu Shi Zhaowei Cai Jun Fang | 2023/10/13 | |
Masqclip for open-vocabulary universal image segmentation | Xin Xu Tianyi Xiong Zheng Ding Zhuowen Tu | 2023 | |
Distilling Large Vision-Language Model with Out-of-Distribution Generalizability | ICCV | Xuanlin Li Yunhao Fang Minghua Liu Zhan Ling Zhuowen Tu | 2023/7/6 |
Characterizing Learning Curves During Language Model Pre-Training: Learning, Forgetting, and Stability | arXiv preprint arXiv:2308.15419 | Tyler A Chang Zhuowen Tu Benjamin K Bergen | 2023/8/29 |
SkeleTR: Towards Skeleton-based Action Recognition in the Wild | Haodong Duan Mingze Xu Bing Shuai Davide Modolo Zhuowen Tu | 2023 | |
Open-Vocabulary Universal Image Segmentation with MaskCLIP | ICML | Zheng Ding Jieke Wang Zhuowen Tu | 2023/6/15 |
Object-centric multiple object tracking | Zixu Zhao Jiaze Wang Max Horn Yizhuo Ding Tong He | 2023 | |
DiffusionRig: Learning Personalized Priors for Facial Appearance Editing | CVPR | Zheng Ding Xuaner Zhang Zhihao Xia Lars Jebe Zhuowen Tu | 2023/4/13 |
Bliva: A simple multimodal llm for better handling of text-rich visual questions | Proceedings of the AAAI Conference on Artificial Intelligence | Wenbo Hu Yifan Xu Yi Li Weiyue Li Zeyuan Chen | 2024/3/24 |