Zhaoyang Zeng
Sun Yat-Sen University
H-index: 12
Asia-China
Top articles of Zhaoyang Zeng
T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy
arXiv preprint arXiv:2403.14610
2024/3/21
TAPTR: Tracking Any Point with Transformers as Detection
arXiv preprint arXiv:2403.13042
2024/3/19
Grounded sam: Assembling open-world models for diverse visual tasks
arXiv preprint arXiv:2401.14159
2024/1/25
T-Rex: Counting by Visual Prompting
arXiv preprint arXiv:2311.13596
2023/11/22
SMP Challenge: An Overview and Analysis of Social Media Prediction Challenge
2023/10/26
detrex: Benchmarking detection transformers
arXiv preprint arXiv:2306.07265
2023/6/12
A strong and reproducible object detector with only public datasets
arXiv preprint arXiv:2304.13027
2023/4/25
Grounding dino: Marrying dino with grounded pre-training for open-set object detection
arXiv preprint arXiv:2303.05499
2023/3/9
DFA3D: 3D Deformable Attention For 2D-to-3D Feature Lifting
2023
Detection transformer with stable matching
2023
Stay in Grid: Improving Video Captioning via Fully Grid-Level Representation
IEEE Transactions on Circuits and Systems for Video Technology
2022/12/27
Tencent-mvse: A large-scale benchmark dataset for multi-modal video similarity evaluation
2022
Contrastive learning of global and local video representations
Advances in Neural Information Processing Systems
2021/12/6
Zhaoyang Zeng
H-Index: 4
Daniel Mcduff
H-Index: 34
Multi-modal representation learning for video advertisement content structuring
2021/10/17
Daya Guo
H-Index: 7
Zhaoyang Zeng
H-Index: 4
Clip4caption++: Multi-clip for video caption
arXiv preprint arXiv:2110.05204
2021/10/11
Be specific, be clear: Bridging machine and human captions by scene-guided transformer
2021/8/21
Zhaoyang Zeng
H-Index: 4
Reference-based defect detection network
IEEE Transactions on Image Processing
2021/7/19
GarbageNet: a unified learning framework for robust garbage classification
IEEE Transactions on Artificial Intelligence
2021/5/18
Seeing out of the box: End-to-end pre-training for vision-language representation learning
2021
Suppressing mislabeled data via grouping and self-attention
2020/8/23