Dan Hendrycks

Dan Hendrycks

University of California, Berkeley

H-index: 31

North America-United States

About Dan Hendrycks

Dan Hendrycks, With an exceptional h-index of 31 and a recent h-index of 31 (since 2020), a distinguished researcher at University of California, Berkeley, specializes in the field of ML Safety, AI Safety, Machine Ethics, ML Reliability.

His recent articles reflect a diverse array of research interests and contributions to the field:

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

The WMDP benchmark: Measuring and reducing malicious use with unlearning

Uncovering Latent Human Wellbeing in Language Model Embeddings

Harmbench: A standardized evaluation framework for automated red teaming and robust refusal

Programmatic Evaluation of Rule-Following Behavior

Identifying and mitigating the security risks of generative ai

Decodingtrust: A comprehensive assessment of trustworthiness in gpt models

How Hard is Trojan Detection in DNNs? Fooling Detectors With Evasive Trojans

Dan Hendrycks Information

University

Position

PhD Student

Citations(all)

20942

Citations(since 2020)

20754

Cited By

4279

hIndex(all)

31

hIndex(since 2020)

31

i10Index(all)

39

i10Index(since 2020)

39

Email

University Profile Page

University of California, Berkeley

Google Scholar

View Google Scholar Profile

Dan Hendrycks Skills & Research Interests

ML Safety

AI Safety

Machine Ethics

ML Reliability

Top articles of Dan Hendrycks

Title

Journal

Author(s)

Publication Date

Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression

arXiv preprint arXiv:2403.15447

Junyuan Hong

Jinhao Duan

Chenhui Zhang

Zhangheng Li

Chulin Xie

...

2024/3/18

The WMDP benchmark: Measuring and reducing malicious use with unlearning

arXiv preprint arXiv:2403.03218

Nathaniel Li

Alexander Pan

Anjali Gopal

Summer Yue

Daniel Berrios

...

2024/3/5

Uncovering Latent Human Wellbeing in Language Model Embeddings

arXiv preprint arXiv:2402.11777

Pedro Freire

ChengCheng Tan

Adam Gleave

Dan Hendrycks

Scott Emmons

2024/2/19

Harmbench: A standardized evaluation framework for automated red teaming and robust refusal

arXiv preprint arXiv:2402.04249

Mantas Mazeika

Long Phan

Xuwang Yin

Andy Zou

Zifan Wang

...

2024/2/6

Programmatic Evaluation of Rule-Following Behavior

Norman Mu

Sarah Li Chen

Zifan Wang

Sizhe Chen

Dan Hendrycks

...

2023/10/13

Identifying and mitigating the security risks of generative ai

Foundations and Trends® in Privacy and Security

Clark Barrett

Brad Boyd

Elie Bursztein

Nicholas Carlini

Brad Chen

...

2023/12/13

Decodingtrust: A comprehensive assessment of trustworthiness in gpt models

arXiv preprint arXiv:2306.11698

Boxin Wang

Weixin Chen

Hengzhi Pei

Chulin Xie

Mintong Kang

...

2023/6/20

How Hard is Trojan Detection in DNNs? Fooling Detectors With Evasive Trojans

Mantas Mazeika

Andy Zou

Akul Arora

Pavel Pleskov

Dawn Song

...

2023/10/13

Can LLMs Follow Simple Rules?

arXiv preprint arXiv:2311.04235

Norman Mu

Sarah Chen

Zifan Wang

Sizhe Chen

David Karamardian

...

2023/11/6

Natural Selection Favors AIs over Humans

arXiv preprint arXiv:2303.16200

Dan Hendrycks

2023/3/28

Representation engineering: A top-down approach to ai transparency

arXiv preprint arXiv:2310.01405

Andy Zou

Long Phan

Sarah Chen

James Campbell

Phillip Guo

...

2023/10/2

Robustness Evaluation of Proxy Models against Adversarial Optimization

Andy Zou

Long Phan

Nathaniel Li

Jun Shern Chan

Mantas Mazeika

...

2023/10/13

MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding

EMNLP

Steven H Wang

Antoine Scardigli

Leonard Tang

Wei Chen

Dimitry Levkin

...

2023/1/2

AI deception: A survey of examples, risks, and potential solutions

Patterns

Peter S Park

Simon Goldstein

Aidan O'Gara

Michael Chen

Dan Hendrycks

2023/8/28

Enhancing Neural Network Transparency through Representation Analysis

Andy Zou

Long Phan

Sarah Li Chen

James Campbell

Phillip Huang Guo

...

2023/10/13

AI risk-management standards profile for general-purpose AI systems (GPAIS) and foundation models

Center for Long-Term Cybersecurity, UC Berkeley. https://perma. cc/8W6P-2UUK

ANTHONY M Barrett

JESSICA Newman

BRANDIE Nonnecke

D Hendrycks

EVAN R Murphy

...

2023

Do the rewards justify the means? measuring trade-offs between rewards and ethical behavior in the machiavelli benchmark

ICML 2023

Alexander Pan

Chan Jun Shern

Andy Zou

Nathaniel Li

Steven Basart

...

2023/4/6

Evaluating Robustness to Unforeseen Adversarial Attacks

Maximilian Kaufmann

Daniel Kang

Yi Sun

Xuwang Yin

Steven Basart

...

2023/10/13

An overview of catastrophic AI risks

Dan Hendrycks

Mantas Mazeika

Thomas Woodside

2023/6/21

Certified adversarial defenses meet out-of-distribution corruptions: Benchmarking robustness and simple baselines

European Conference on Computer Vision (ECCV)

Jiachen Sun

Akshay Mehra

Bhavya Kailkhura

Pin-Yu Chen

Dan Hendrycks

...

2022

See List of Professors in Dan Hendrycks University(University of California, Berkeley)

Co-Authors

H-index: 143
Dawn Song

Dawn Song

University of California, Berkeley

H-index: 89
Thomas Dietterich

Thomas Dietterich

Oregon State University

H-index: 46
Kevin Gimpel

Kevin Gimpel

Toyota Technological Institute

H-index: 42
Jacob Steinhardt

Jacob Steinhardt

Stanford University

academic-engine