Benjamin Van Roy
Stanford University
H-index: 58
North America-United States
Top articles of Benjamin Van Roy
Title | Journal | Author(s) | Publication Date |
---|---|---|---|
Adaptive Crowdsourcing Via Self-Supervised Learning | arXiv preprint arXiv:2401.13239 | Anmol Kagrecha Henrik Marklund Benjamin Van Roy Hong Jun Jeon Richard Zeckhauser | 2024/1/24 |
Leveraging offline training data and agent competency measures to improve online learning | 2024/4/25 | ||
Bayesian reinforcement learning with limited cognitive load | Open Mind | Dilip Arumugam Mark K Ho Noah D Goodman Benjamin Van Roy | 2024/4/16 |
A definition of continual reinforcement learning | Advances in Neural Information Processing Systems | David Abel André Barreto Benjamin Van Roy Doina Precup Hado P van Hasselt | 2024/2/13 |
Efficient Exploration for LLMs | arXiv preprint arXiv:2402.00396 | Vikranth Dwaracherla Seyed Mohammad Asghari Botao Hao Benjamin Van Roy | 2024/2/1 |
An Information-Theoretic Analysis of In-Context Learning | arXiv preprint arXiv:2401.15530 | Hong Jun Jeon Jason D Lee Qi Lei Benjamin Van Roy | 2024/1/28 |
Approximate thompson sampling via epistemic neural networks | Ian Osband Zheng Wen Seyed Mohammad Asghari Vikranth Dwaracherla Morteza Ibrahimi | 2023/7/2 | |
Maintaining plasticity via regenerative regularization | arXiv preprint arXiv:2308.11958 | Saurabh Kumar Henrik Marklund Benjamin Van Roy | 2023/8/23 |
Shattering the agent-environment interface for fine-tuning inclusive language models | arXiv preprint arXiv:2305.11455 | Wanqiao Xu Shi Dong Dilip Arumugam Benjamin Van Roy | 2023/5/19 |
RLHF and IIA: Perverse Incentives | arXiv e-prints | Wanqiao Xu Shi Dong Xiuyuan Lu Grace Lam Zheng Wen | 2023/12 |
On the Convergence of Bounded Agents | arXiv preprint arXiv:2307.11044 | David Abel André Barreto Hado van Hasselt Benjamin Van Roy Doina Precup | 2023/7/20 |
Nonstationary bandit learning via predictive sampling | Yueyang Liu Benjamin Van Roy Kuang Xu | 2023/4/11 | |
Continual learning as computationally constrained reinforcement learning | arXiv preprint arXiv:2307.04345 | Saurabh Kumar Henrik Marklund Ashish Rao Yifan Zhu Hong Jun Jeon | 2023/7/10 |
Scalable neural contextual bandit for recommender systems | arXiv preprint arXiv:2306.14834 | Zheqing Zhu Benjamin Van Roy | 2023/6/26 |
A definition of non-stationary bandits | arXiv preprint arXiv:2302.12202 | Yueyang Liu Xu Kuang Benjamin Van Roy | 2023/2/23 |
Reinforcement learning, bit by bit | Foundations and Trends® in Machine Learning | Xiuyuan Lu Benjamin Van Roy Vikranth Dwaracherla Morteza Ibrahimi Ian Osband | 2023/7/10 |
Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling | arXiv preprint arXiv:2310.07786 | Zheqing Zhu Yueyang Liu Xu Kuang Benjamin Van Roy | 2023/10/11 |
Epistemic neural networks | Advances in Neural Information Processing Systems | Ian Osband Zheng Wen Seyed Mohammad Asghari Vikranth Dwaracherla Morteza Ibrahimi | 2024/2/13 |
Leveraging demonstrations to improve online learning: Quality matters | Botao Hao Rahul Jain Tor Lattimore Benjamin Van Roy Zheng Wen | 2023/7/3 | |
Deep exploration for recommendation systems | Zheqing Zhu Benjamin Van Roy | 2023/9/14 |