Himabindu Lakkaraju

Himabindu Lakkaraju

Harvard University

H-index: 34

North America-United States

About Himabindu Lakkaraju

Himabindu Lakkaraju, With an exceptional h-index of 34 and a recent h-index of 32 (since 2020), a distinguished researcher at Harvard University, specializes in the field of Explainable & Fair ML, Adversarial Robustness, Human Centric ML, ML for Healthcare & Law.

His recent articles reflect a diverse array of research interests and contributions to the field:

More RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthiness

Quantifying uncertainty in natural language explanations of large language models

Towards Safe and Aligned Large Language Models for Medicine

Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems

Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)

Opening the Black Box of Large Language Models: Two Views on Holistic Interpretability

Which models have perceptually-aligned gradients? an explanation via off-manifold robustness

Discriminative Feature Attributions: Bridging Post Hoc Explainability and Inherent Interpretability

Himabindu Lakkaraju Information

University

Position

Assistant Professor

Citations(all)

6902

Citations(since 2020)

6004

Cited By

2561

hIndex(all)

34

hIndex(since 2020)

32

i10Index(all)

49

i10Index(since 2020)

46

Email

University Profile Page

Google Scholar

Himabindu Lakkaraju Skills & Research Interests

Explainable & Fair ML

Adversarial Robustness

Human Centric ML

ML for Healthcare & Law

Top articles of Himabindu Lakkaraju

More RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthiness

arXiv preprint arXiv:2404.18870

2024/4/29

Himabindu Lakkaraju
Himabindu Lakkaraju

H-Index: 18

Quantifying uncertainty in natural language explanations of large language models

2024/4/18

Chirag Agarwal
Chirag Agarwal

H-Index: 7

Himabindu Lakkaraju
Himabindu Lakkaraju

H-Index: 18

Towards Safe and Aligned Large Language Models for Medicine

arXiv preprint arXiv:2403.03744

2024/3/6

Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems

arXiv preprint arXiv:2402.17840

2024/2/27

Hanlin Zhang
Hanlin Zhang

H-Index: 1

Eric Xing
Eric Xing

H-Index: 76

Himabindu Lakkaraju
Himabindu Lakkaraju

H-Index: 18

Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE)

arXiv preprint arXiv:2402.10376

2024/2/16

Alex Oesterling
Alex Oesterling

H-Index: 0

Himabindu Lakkaraju
Himabindu Lakkaraju

H-Index: 18

Opening the Black Box of Large Language Models: Two Views on Holistic Interpretability

arXiv preprint arXiv:2402.10688

2024/2/16

Haiyan Zhao
Haiyan Zhao

H-Index: 16

Fan Yang
Fan Yang

H-Index: 3

Himabindu Lakkaraju
Himabindu Lakkaraju

H-Index: 18

Which models have perceptually-aligned gradients? an explanation via off-manifold robustness

Advances in neural information processing systems

2024/2/13

Sebastian Bordt
Sebastian Bordt

H-Index: 1

Himabindu Lakkaraju
Himabindu Lakkaraju

H-Index: 18

Discriminative Feature Attributions: Bridging Post Hoc Explainability and Inherent Interpretability

Advances in Neural Information Processing Systems

2024/2/13

Himabindu Lakkaraju
Himabindu Lakkaraju

H-Index: 18

Post hoc explanations of language models can improve language models

arXiv preprint arXiv:2305.11426

2023/5/19

Understanding the Effects of Iterative Prompting on Truthfulness

arXiv preprint arXiv:2402.06625

2024/2/9

Chirag Agarwal
Chirag Agarwal

H-Index: 7

Himabindu Lakkaraju
Himabindu Lakkaraju

H-Index: 18

Faithfulness vs. Plausibility: On the (Un) Reliability of Explanations from Large Language Models

arXiv preprint arXiv:2402.04614

2024/2/7

Chirag Agarwal
Chirag Agarwal

H-Index: 7

Himabindu Lakkaraju
Himabindu Lakkaraju

H-Index: 18

OpenXAI: Towards a Transparent Evaluation of Model Explanations

Advances in Neural Information Processing Systems

2022/12/6

Consistent explanations in the face of model indeterminacy via ensembling

2023/6/9

Dan Ley
Dan Ley

H-Index: 0

Himabindu Lakkaraju
Himabindu Lakkaraju

H-Index: 18

Investigating the Fairness of Large Language Models for Predictions on Tabular Data

arXiv preprint arXiv:2310.14607

2023/10/23

Yanchen Liu
Yanchen Liu

H-Index: 6

Jiaqi Ma
Jiaqi Ma

H-Index: 6

Himabindu Lakkaraju
Himabindu Lakkaraju

H-Index: 18

In-context unlearning: Language models as few shot unlearners

arXiv preprint arXiv:2310.07579

2023/10/11

Martin Pawelczyk
Martin Pawelczyk

H-Index: 3

Himabindu Lakkaraju
Himabindu Lakkaraju

H-Index: 18

Word-level explanations for analyzing bias in text-to-image models

arXiv preprint arXiv:2306.05500

2023/6/3

Alexander Lin
Alexander Lin

H-Index: 25

Himabindu Lakkaraju
Himabindu Lakkaraju

H-Index: 18

Are Large Language Models Post Hoc Explainers?

arXiv preprint arXiv:2310.05797

2023/10/9

On the Trade-offs between Adversarial Robustness and Actionable Explanations

arXiv preprint arXiv:2309.16452

2023/9/28

Chirag Agarwal
Chirag Agarwal

H-Index: 7

Himabindu Lakkaraju
Himabindu Lakkaraju

H-Index: 18

The Disagreement Problem in Explainable Machine Learning: A Practitioner's Perspective

2023/5/23

Shahin Jabbari
Shahin Jabbari

H-Index: 8

Steven Wu
Steven Wu

H-Index: 12

Himabindu Lakkaraju
Himabindu Lakkaraju

H-Index: 18

Certifying llm safety against adversarial prompting

arXiv preprint arXiv:2309.02705

2023/9/6

See List of Professors in Himabindu Lakkaraju University(Harvard University)

Co-Authors

academic-engine