ProfessorsProfessors of University of California, BerkeleyDan Hendrycks

Dan Hendrycks

University of California, Berkeley

H-index: 31

North America-United States

About Dan Hendrycks

Dan Hendrycks, With an exceptional h-index of 31 and a recent h-index of 31 (since 2020), a distinguished researcher at University of California, Berkeley, specializes in the field of ML Safety, AI Safety, Machine Ethics, ML Reliability.

His recent articles reflect a diverse array of research interests and contributions to the field:

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

The WMDP benchmark: Measuring and reducing malicious use with unlearning

Uncovering Latent Human Wellbeing in Language Model Embeddings

Harmbench: A standardized evaluation framework for automated red teaming and robust refusal

Programmatic Evaluation of Rule-Following Behavior

Identifying and mitigating the security risks of generative ai

Decodingtrust: A comprehensive assessment of trustworthiness in gpt models

How Hard is Trojan Detection in DNNs? Fooling Detectors With Evasive Trojans

Dan Hendrycks Information

University	University of California, Berkeley
Position	PhD Student
Citations(all)	20942
Citations(since 2020)	20754
Cited By	4279
hIndex(all)	31
hIndex(since 2020)	31
i10Index(all)	39
i10Index(since 2020)	39
Email	Access Email
University Profile Page	University of California, Berkeley
Google Scholar	View Google Scholar Profile

Dan Hendrycks Skills & Research Interests

ML Safety

AI Safety

Machine Ethics

ML Reliability

Top articles of Dan Hendrycks

Title	Journal	Author(s)	Publication Date
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression	arXiv preprint arXiv:2403.15447	Junyuan Hong Jinhao Duan Chenhui Zhang Zhangheng Li Chulin Xie ...	2024/3/18
The WMDP benchmark: Measuring and reducing malicious use with unlearning	arXiv preprint arXiv:2403.03218	Nathaniel Li Alexander Pan Anjali Gopal Summer Yue Daniel Berrios ...	2024/3/5
Uncovering Latent Human Wellbeing in Language Model Embeddings	arXiv preprint arXiv:2402.11777	Pedro Freire ChengCheng Tan Adam Gleave Dan Hendrycks Scott Emmons	2024/2/19
Harmbench: A standardized evaluation framework for automated red teaming and robust refusal	arXiv preprint arXiv:2402.04249	Mantas Mazeika Long Phan Xuwang Yin Andy Zou Zifan Wang ...	2024/2/6
Programmatic Evaluation of Rule-Following Behavior		Norman Mu Sarah Li Chen Zifan Wang Sizhe Chen Dan Hendrycks ...	2023/10/13
Identifying and mitigating the security risks of generative ai	Foundations and Trends® in Privacy and Security	Clark Barrett Brad Boyd Elie Bursztein Nicholas Carlini Brad Chen ...	2023/12/13
Decodingtrust: A comprehensive assessment of trustworthiness in gpt models	arXiv preprint arXiv:2306.11698	Boxin Wang Weixin Chen Hengzhi Pei Chulin Xie Mintong Kang ...	2023/6/20
How Hard is Trojan Detection in DNNs? Fooling Detectors With Evasive Trojans		Mantas Mazeika Andy Zou Akul Arora Pavel Pleskov Dawn Song ...	2023/10/13
Can LLMs Follow Simple Rules?	arXiv preprint arXiv:2311.04235	Norman Mu Sarah Chen Zifan Wang Sizhe Chen David Karamardian ...	2023/11/6
Natural Selection Favors AIs over Humans	arXiv preprint arXiv:2303.16200	Dan Hendrycks	2023/3/28
Representation engineering: A top-down approach to ai transparency	arXiv preprint arXiv:2310.01405	Andy Zou Long Phan Sarah Chen James Campbell Phillip Guo ...	2023/10/2
Robustness Evaluation of Proxy Models against Adversarial Optimization		Andy Zou Long Phan Nathaniel Li Jun Shern Chan Mantas Mazeika ...	2023/10/13
MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding	EMNLP	Steven H Wang Antoine Scardigli Leonard Tang Wei Chen Dimitry Levkin ...	2023/1/2
AI deception: A survey of examples, risks, and potential solutions	Patterns	Peter S Park Simon Goldstein Aidan O'Gara Michael Chen Dan Hendrycks	2023/8/28
Enhancing Neural Network Transparency through Representation Analysis		Andy Zou Long Phan Sarah Li Chen James Campbell Phillip Huang Guo ...	2023/10/13
AI risk-management standards profile for general-purpose AI systems (GPAIS) and foundation models	Center for Long-Term Cybersecurity, UC Berkeley. https://perma. cc/8W6P-2UUK	ANTHONY M Barrett JESSICA Newman BRANDIE Nonnecke D Hendrycks EVAN R Murphy ...	2023
Do the rewards justify the means? measuring trade-offs between rewards and ethical behavior in the machiavelli benchmark	ICML 2023	Alexander Pan Chan Jun Shern Andy Zou Nathaniel Li Steven Basart ...	2023/4/6
Evaluating Robustness to Unforeseen Adversarial Attacks		Maximilian Kaufmann Daniel Kang Yi Sun Xuwang Yin Steven Basart ...	2023/10/13
An overview of catastrophic AI risks		Dan Hendrycks Mantas Mazeika Thomas Woodside	2023/6/21
Certified adversarial defenses meet out-of-distribution corruptions: Benchmarking robustness and simple baselines	European Conference on Computer Vision (ECCV)	Jiachen Sun Akshay Mehra Bhavya Kailkhura Pin-Yu Chen Dan Hendrycks ...	2022