Yunhao Tang
Columbia University in the City of New York
H-index: 15
North America-United States
Top articles of Yunhao Tang
Title | Journal | Author(s) | Publication Date |
---|---|---|---|
Generalized Preference Optimization: A Unified Approach to Offline Alignment | arXiv preprint arXiv:2402.05749 | Yunhao Tang Zhaohan Daniel Guo Zeyu Zheng Daniele Calandriello Rémi Munos | 2024/2/8 |
Learning Uncertainty-Aware Temporally-Extended Actions | Proceedings of the AAAI Conference on Artificial Intelligence | Joongkyu Lee Seung Joon Park Yunhao Tang Min-hwan Oh | 2024/3/24 |
Off-policy Distributional Q(): Distributional RL without Importance Sampling | arXiv preprint arXiv:2402.05766 | Yunhao Tang Mark Rowland Rémi Munos Bernardo Ávila Pires Will Dabney | 2024/2/8 |
Human Alignment of Large Language Models through Online Preference Optimisation | arXiv preprint arXiv:2403.08635 | Daniele Calandriello Daniel Guo Remi Munos Mark Rowland Yunhao Tang | 2024/3/13 |
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context | arXiv preprint arXiv:2403.05530 | Machel Reid Nikolay Savinov Denis Teplyashin Dmitry Lepikhin Timothy Lillicrap | 2024/3/8 |
A Distributional Analogue to the Successor Representation | arXiv preprint arXiv:2402.08530 | Harley Wiltzer* Jesse Farebrother* Arthur Gretton Yunhao Tang André Barreto | 2024/2/13 |
Near-Minimax-Optimal Distributional Reinforcement Learning with a Generative Model | arXiv preprint arXiv:2402.07598 | Mark Rowland Li Kevin Wenliang Rémi Munos Clare Lyle Yunhao Tang | 2024/2/12 |
The statistical benefits of quantile temporal-difference learning for value estimation | Mark Rowland Yunhao Tang Clare Lyle Rémi Munos Marc G Bellemare | 2023/7/3 | |
Fast rates for maximum entropy exploration | Daniil Tiapkin Denis Belomestny Daniele Calandriello Eric Moulines Remi Munos | 2023/3/14 | |
Towards a better understanding of representation dynamics under TD-learning | Yunhao Tang Rémi Munos | 2023/7/3 | |
The edge of orthogonality: a simple view of what makes BYOL tick | Pierre Harvey Richemond Allison Tam Yunhao Tang Florian Strub Bilal Piot | 2023/7/3 | |
DoMo-AC: doubly multi-step off-policy actor-critic algorithm | Yunhao Tang Tadashi Kozuno Mark Rowland Anna Harutyunyan Rémi Munos | 2023/7/3 | |
Gemini: a family of highly capable multimodal models | arXiv preprint arXiv:2312.11805 | Gemini Team Rohan Anil Sebastian Borgeaud Yonghui Wu Jean-Baptiste Alayrac | 2023/12/19 |
Understanding self-predictive learning for reinforcement learning | International Conference on Machine Learning (ICML23) | Yunhao Tang Zhaohan Daniel Guo Pierre Harvey Richemond Bernardo Ávila Pires Yash Chandak | 2023/1/6 |
VA-learning as a more efficient alternative to Q-learning | Yunhao Tang Rémi Munos Mark Rowland Michal Valko | 2023/7/3 | |
Nash learning from human feedback | arXiv preprint arXiv:2312.00886 | Rémi Munos Michal Valko Daniele Calandriello Mohammad Gheshlaghi Azar Mark Rowland | 2023/12/1 |
An analysis of quantile temporal-difference learning | arXiv preprint arXiv:2301.04462 | Mark Rowland Rémi Munos Mohammad Gheshlaghi Azar Yunhao Tang Georg Ostrovski | 2023/1/11 |
Regularization and variance-weighted regression achieves minimax optimality in linear MDPs: theory and practice | Toshinori Kitamura Tadashi Kozuno Yunhao Tang Nino Vieillard Michal Valko | 2023/7/3 | |
Quantile credit assignment | Thomas Mesnard Wenqi Chen Alaa Saade Yunhao Tang Mark Rowland | 2023/7/3 | |
Representations and exploration for deep reinforcement learning using singular value decomposition | Yash Chandak Shantanu Thakoor Zhaohan Daniel Guo Yunhao Tang Remi Munos | 2023/7/3 |