ProfessorsProfessors of University of Southern CaliforniaBowen Zhang

Bowen Zhang

University of Southern California

H-index: 13

North America-United States

About Bowen Zhang

Bowen Zhang, With an exceptional h-index of 13 and a recent h-index of 12 (since 2020), a distinguished researcher at University of Southern California, specializes in the field of Computer Vision, Action Recognition, Natural Language Processing.

His recent articles reflect a diverse array of research interests and contributions to the field:

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness

From scarcity to efficiency: Improving clip training via visual-enriched captions

STAIR: Learning Sparse Text and Image Representation in Grounded Tokens

Ferret: Refer and ground anything anywhere at any granularity

Compressing LLMs: The Truth is Rarely Pure and Never Simple

Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts

MOFI: Learning Image Representations from Noisy Entity Annotated Images

Bowen Zhang Information

University	University of Southern California
Position	___
Citations(all)	1253
Citations(since 2020)	1033
Cited By	539
hIndex(all)	13
hIndex(since 2020)	12
i10Index(all)	13
i10Index(since 2020)	12
Email	Access Email
University Profile Page	University of Southern California
Google Scholar	View Google Scholar Profile

Bowen Zhang Skills & Research Interests

Computer Vision

Action Recognition

Natural Language Processing

Top articles of Bowen Zhang

Title	Journal	Author(s)	Publication Date
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training	arXiv preprint arXiv:2403.09611	Brandon McKinzie Zhe Gan Jean-Philippe Fauconnier Sam Dodge Bowen Zhang ...	2024/3/14
Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness	arXiv preprint arXiv:2305.05095	Liangliang Cao Bowen Zhang Chen Chen Yinfei Yang Xianzhi Du ...	2023/5/8
From scarcity to efficiency: Improving clip training via visual-enriched captions	arXiv preprint arXiv:2310.07699	Zhengfeng Lai* Haotian Zhang* Wentao Wu Haoping Bai Aleksei Timofeev ...	2023/10/11
STAIR: Learning Sparse Text and Image Representation in Grounded Tokens	arXiv preprint arXiv:2301.13081	Chen Chen Bowen Zhang Liangliang Cao Jiguang Shen Tom Gunter ...	2023/1/30
Ferret: Refer and ground anything anywhere at any granularity	arXiv preprint arXiv:2310.07704	Haoxuan You Haotian Zhang Zhe Gan Xianzhi Du Bowen Zhang ...	2023/10/11
Compressing LLMs: The Truth is Rarely Pure and Never Simple	arXiv preprint arXiv:2310.01382	Ajay Jaiswal Zhe Gan Xianzhi Du Bowen Zhang Zhangyang Wang ...	2023/10/2
Mobile V-MoEs: Scaling Down Vision Transformers via Sparse Mixture-of-Experts	arXiv 2023	Erik Daxberger Floris Weers Bowen Zhang Tom Gunter Ruoming Pang ...	2023/9/8
MOFI: Learning Image Representations from Noisy Entity Annotated Images	arXiv preprint arXiv:2306.07952	Wentao Wu Aleksei Timofeev Chen Chen Bowen Zhang Kun Duan ...	2023/6/13
Hierarchical video encoders			2022/12/20
Visual Representation Learning with Structural Prior		Bowen Zhang	2022
Systematic Generalization on gSCAN: What is Nearly Solved and What is Next?	arXiv preprint arXiv:2109.12243	Linlu Qiu Hexiang Hu Bowen Zhang Peter Shaw Fei Sha	2021/9/25
Co-training Transformer with Videos and Images Improves Action Recognition	arXiv preprint arXiv:2112.07175	Bowen Zhang Jiahui Yu Christopher Fifty Wei Han Andrew M Dai ...	2021/12/14
Visually Grounded Concept Composition		Bowen Zhang Hexiang Hu Linlu Qiu Peter Shaw Fei Sha	2021/9/29
A Hierarchical Multi-Modal Encoder for Moment Localization in Video Corpus	arXiv preprint arXiv:2011.09046	Bowen Zhang Hexiang Hu Joonseok Lee Ming Zhao Sheide Chammas ...	2020/11/18
Online Action Detection in Streaming Videos with Time Buffers	arXiv preprint arXiv:2010.03016	Bowen Zhang Hao Chen Meng Wang Yuanjun Xiong	2020/10/6
Learning to Represent Image and Text with Denotation Graph		Bowen Zhang Hexiang Hu Vihan Jain Eugene Ie Fei Sha	2020