Anca D Dragan
University of California, Berkeley
H-index: 55
North America-United States
Top articles of Anca D Dragan
Title | Journal | Author(s) | Publication Date |
---|---|---|---|
Learning optimal advantage from preferences and mistaking it for reward | Proceedings of the AAAI Conference on Artificial Intelligence | W Bradley Knox Stephane Hatgis-Kessell Sigurdur Orn Adalgeirsson Serena Booth Anca Dragan | 2024/3/24 |
When Your AI Deceives You: Challenges with Partial Observability of Human Evaluators in Reward Learning | arXiv preprint arXiv:2402.17747 | Leon Lang Davis Foote Stuart Russell Anca Dragan Erik Jenner | 2024/2/27 |
Evaluating Frontier Models for Dangerous Capabilities | arXiv preprint arXiv:2403.13793 | Mary Phuong Matthew Aitchison Elliot Catt Sarah Cogan Alexandre Kaskasoli | 2024/3/20 |
Learning to influence human behavior with offline reinforcement learning | Advances in Neural Information Processing Systems | Joey Hong Sergey Levine Anca Dragan | 2024/2/13 |
Aligning Human and Robot Representations | Andreea Bobu Andi Peng Pulkit Agrawal Julie A Shah Anca D Dragan | 2024/3/11 | |
Optimizing Robot Behavior via Comparative Language Feedback | Jeremy Tien Zhaojing Yang Miru Jun Stuart J Russell Anca Dragan | 2024 | |
Defining Deception in Decision Making | Marwa Abdulhai Micah Carroll Justin Svegliato Anca Dragan Sergey Levine | 2024/5/6 | |
A Generalized Acquisition Function for Preference-based Reward Learning | arXiv preprint arXiv:2403.06003 | Evan Ellis Gaurav R Ghosal Stuart J Russell Anca Dragan Erdem Bıyık | 2024/3/9 |
Preventing reward hacking with occupancy measure regularization | arXiv preprint arXiv:2403.03185 | Cassidy Laidlaw Shivam Singhal Anca Dragan | 2024/3/5 |
Quantifying Assistive Robustness Via the Natural-Adversarial Frontier | Jerry Zhi-Yang He Daniel S Brown Zackory Erickson Anca Dragan | 2023/12/2 | |
Sirl: Similarity-based implicit representation learning | Andreea Bobu Yi Liu Rohin Shah Daniel S Brown Anca D Dragan | 2023/3/13 | |
Similarity-Based Representation Learning | Yi Liu Andreea Bobu Anca Dragan | 2023/5/9 | |
Contextual Reliability: When Different Features Matter in Different Contexts | Gaurav Rohit Ghosal Amrith Setlur Daniel S Brown Anca Dragan Aditi Raghunathan | 2023/7/3 | |
Confronting reward model overoptimization with constrained rlhf | arXiv preprint arXiv:2310.04373 | Ted Moskovitz Aaditya K Singh DJ Strouse Tuomas Sandholm Ruslan Salakhutdinov | 2023/10/6 |
Goal representations for instruction following: A semi-supervised language interface to control | Vivek Myers Andre Wang He Kuan Fang Homer Rich Walke Philippe Hansen-Estruch | 2023/12/2 | |
Learning representations that enable generalization in assistive tasks | Jerry Zhi-Yang He Zackory Erickson Daniel S Brown Aditi Raghunathan Anca Dragan | 2023/3/6 | |
Control for societal-scale challenges: Road map 2030 | Andrew Alleyne Frank Allgöwer Aaron Ames Saurabh Amin James Anderson | 2023/5 | |
Automatically auditing large language models via discrete optimization | International Conference on Machine Learning | Erik Jones Anca Dragan Aditi Raghunathan Jacob Steinhardt | 2023/3/8 |
Video-Guided Skill Discovery | Manan Tomar Dibya Ghosh Vivek Myers Anca Dragan Matthew E Taylor | 2023/10/4 | |
Zero-shot goal-directed dialogue via rl on imagined conversations | arXiv preprint arXiv:2311.05584 | Joey Hong Sergey Levine Anca Dragan | 2023/11/9 |