Dan Hendrycks
University of California, Berkeley
H-index: 31
North America-United States
Top articles of Dan Hendrycks
Title | Journal | Author(s) | Publication Date |
---|---|---|---|
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression | arXiv preprint arXiv:2403.15447 | Junyuan Hong Jinhao Duan Chenhui Zhang Zhangheng Li Chulin Xie | 2024/3/18 |
The WMDP benchmark: Measuring and reducing malicious use with unlearning | arXiv preprint arXiv:2403.03218 | Nathaniel Li Alexander Pan Anjali Gopal Summer Yue Daniel Berrios | 2024/3/5 |
Uncovering Latent Human Wellbeing in Language Model Embeddings | arXiv preprint arXiv:2402.11777 | Pedro Freire ChengCheng Tan Adam Gleave Dan Hendrycks Scott Emmons | 2024/2/19 |
Harmbench: A standardized evaluation framework for automated red teaming and robust refusal | arXiv preprint arXiv:2402.04249 | Mantas Mazeika Long Phan Xuwang Yin Andy Zou Zifan Wang | 2024/2/6 |
Programmatic Evaluation of Rule-Following Behavior | Norman Mu Sarah Li Chen Zifan Wang Sizhe Chen Dan Hendrycks | 2023/10/13 | |
Identifying and mitigating the security risks of generative ai | Foundations and Trends® in Privacy and Security | Clark Barrett Brad Boyd Elie Bursztein Nicholas Carlini Brad Chen | 2023/12/13 |
Decodingtrust: A comprehensive assessment of trustworthiness in gpt models | arXiv preprint arXiv:2306.11698 | Boxin Wang Weixin Chen Hengzhi Pei Chulin Xie Mintong Kang | 2023/6/20 |
How Hard is Trojan Detection in DNNs? Fooling Detectors With Evasive Trojans | Mantas Mazeika Andy Zou Akul Arora Pavel Pleskov Dawn Song | 2023/10/13 | |
Can LLMs Follow Simple Rules? | arXiv preprint arXiv:2311.04235 | Norman Mu Sarah Chen Zifan Wang Sizhe Chen David Karamardian | 2023/11/6 |
Natural Selection Favors AIs over Humans | arXiv preprint arXiv:2303.16200 | Dan Hendrycks | 2023/3/28 |
Representation engineering: A top-down approach to ai transparency | arXiv preprint arXiv:2310.01405 | Andy Zou Long Phan Sarah Chen James Campbell Phillip Guo | 2023/10/2 |
Robustness Evaluation of Proxy Models against Adversarial Optimization | Andy Zou Long Phan Nathaniel Li Jun Shern Chan Mantas Mazeika | 2023/10/13 | |
MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding | EMNLP | Steven H Wang Antoine Scardigli Leonard Tang Wei Chen Dimitry Levkin | 2023/1/2 |
AI deception: A survey of examples, risks, and potential solutions | Patterns | Peter S Park Simon Goldstein Aidan O'Gara Michael Chen Dan Hendrycks | 2023/8/28 |
Enhancing Neural Network Transparency through Representation Analysis | Andy Zou Long Phan Sarah Li Chen James Campbell Phillip Huang Guo | 2023/10/13 | |
AI risk-management standards profile for general-purpose AI systems (GPAIS) and foundation models | Center for Long-Term Cybersecurity, UC Berkeley. https://perma. cc/8W6P-2UUK | ANTHONY M Barrett JESSICA Newman BRANDIE Nonnecke D Hendrycks EVAN R Murphy | 2023 |
Do the rewards justify the means? measuring trade-offs between rewards and ethical behavior in the machiavelli benchmark | ICML 2023 | Alexander Pan Chan Jun Shern Andy Zou Nathaniel Li Steven Basart | 2023/4/6 |
Evaluating Robustness to Unforeseen Adversarial Attacks | Maximilian Kaufmann Daniel Kang Yi Sun Xuwang Yin Steven Basart | 2023/10/13 | |
An overview of catastrophic AI risks | Dan Hendrycks Mantas Mazeika Thomas Woodside | 2023/6/21 | |
Certified adversarial defenses meet out-of-distribution corruptions: Benchmarking robustness and simple baselines | European Conference on Computer Vision (ECCV) | Jiachen Sun Akshay Mehra Bhavya Kailkhura Pin-Yu Chen Dan Hendrycks | 2022 |