Bowen Zhang
University of Southern California
H-index: 13
North America-United States
Top articles of Bowen Zhang
Title | Journal | Author(s) | Publication Date |
---|---|---|---|
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training | arXiv preprint arXiv:2403.09611 | Brandon McKinzie Zhe Gan Jean-Philippe Fauconnier Sam Dodge Bowen Zhang | 2024/3/14 |
Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness | arXiv preprint arXiv:2305.05095 | Liangliang Cao Bowen Zhang Chen Chen Yinfei Yang Xianzhi Du | 2023/5/8 |
From scarcity to efficiency: Improving clip training via visual-enriched captions | arXiv preprint arXiv:2310.07699 | Zhengfeng Lai* Haotian Zhang* Wentao Wu Haoping Bai Aleksei Timofeev | 2023/10/11 |
STAIR: Learning Sparse Text and Image Representation in Grounded Tokens | arXiv preprint arXiv:2301.13081 | Chen Chen Bowen Zhang Liangliang Cao Jiguang Shen Tom Gunter | 2023/1/30 |
Ferret: Refer and ground anything anywhere at any granularity | arXiv preprint arXiv:2310.07704 | Haoxuan You Haotian Zhang Zhe Gan Xianzhi Du Bowen Zhang | 2023/10/11 |
Compressing LLMs: The Truth is Rarely Pure and Never Simple | arXiv preprint arXiv:2310.01382 | Ajay Jaiswal Zhe Gan Xianzhi Du Bowen Zhang Zhangyang Wang | 2023/10/2 |
Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts | arXiv 2023 | Erik Daxberger Floris Weers Bowen Zhang Tom Gunter Ruoming Pang | 2023/9/8 |
MOFI: Learning Image Representations from Noisy Entity Annotated Images | arXiv preprint arXiv:2306.07952 | Wentao Wu Aleksei Timofeev Chen Chen Bowen Zhang Kun Duan | 2023/6/13 |
Hierarchical video encoders | 2022/12/20 | ||
Visual Representation Learning with Structural Prior | Bowen Zhang | 2022 | |
Systematic Generalization on gSCAN: What is Nearly Solved and What is Next? | arXiv preprint arXiv:2109.12243 | Linlu Qiu Hexiang Hu Bowen Zhang Peter Shaw Fei Sha | 2021/9/25 |
Co-training Transformer with Videos and Images Improves Action Recognition | arXiv preprint arXiv:2112.07175 | Bowen Zhang Jiahui Yu Christopher Fifty Wei Han Andrew M Dai | 2021/12/14 |
Visually Grounded Concept Composition | Bowen Zhang Hexiang Hu Linlu Qiu Peter Shaw Fei Sha | 2021/9/29 | |
A Hierarchical Multi-Modal Encoder for Moment Localization in Video Corpus | arXiv preprint arXiv:2011.09046 | Bowen Zhang Hexiang Hu Joonseok Lee Ming Zhao Sheide Chammas | 2020/11/18 |
Online Action Detection in Streaming Videos with Time Buffers | arXiv preprint arXiv:2010.03016 | Bowen Zhang Hao Chen Meng Wang Yuanjun Xiong | 2020/10/6 |
Learning to Represent Image and Text with Denotation Graph | Bowen Zhang Hexiang Hu Vihan Jain Eugene Ie Fei Sha | 2020 |