Hannah Rose Kirk

Hannah Rose Kirk

University of Oxford

H-index: 10

Europe-United Kingdom

About Hannah Rose Kirk

Hannah Rose Kirk, With an exceptional h-index of 10 and a recent h-index of 10 (since 2020), a distinguished researcher at University of Oxford, specializes in the field of Large language models, NLP, Ethics in AI, Alignment, AI Safety.

His recent articles reflect a diverse array of research interests and contributions to the field:

Introducing v0. 5 of the AI Safety Benchmark from MLCommons

Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models

Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation

Dataperf: Benchmarks for data-centric ai development

Visogender: A dataset for benchmarking gender bias in image-text pronoun resolution

The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models

The benefits, risks and bounds of personalizing the alignment of large language models to individuals

SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models

Hannah Rose Kirk Information

University

Position

___

Citations(all)

497

Citations(since 2020)

497

Cited By

1

hIndex(all)

10

hIndex(since 2020)

10

i10Index(all)

11

i10Index(since 2020)

11

Email

University Profile Page

Google Scholar

Hannah Rose Kirk Skills & Research Interests

Large language models

NLP

Ethics in AI

Alignment

AI Safety

Top articles of Hannah Rose Kirk

Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models

arXiv preprint arXiv:2402.16786

2024/2/26

Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation

arXiv preprint arXiv:2403.12075

2024/2/14

Visogender: A dataset for benchmarking gender bias in image-text pronoun resolution

Advances in Neural Information Processing Systems

2024/2/13

Aleksandar Shtedritski
Aleksandar Shtedritski

H-Index: 0

Hannah Rose Kirk
Hannah Rose Kirk

H-Index: 1

The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models

arXiv preprint arXiv:2404.16019

2024/4/24

The benefits, risks and bounds of personalizing the alignment of large language models to individuals

2024/4/23

Hannah Rose Kirk
Hannah Rose Kirk

H-Index: 1

Paul Röttger
Paul Röttger

H-Index: 1

SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models

arXiv preprint arXiv:2311.08370

2023/11/14

Hannah Rose Kirk
Hannah Rose Kirk

H-Index: 1

Paul Röttger
Paul Röttger

H-Index: 1

The past, present and better future of feedback learning in large language models for subjective human preferences and values

arXiv preprint arXiv:2310.07629

2023/10/11

Hannah Rose Kirk
Hannah Rose Kirk

H-Index: 1

Paul Röttger
Paul Röttger

H-Index: 1

The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment" in Large Language Models

arXiv preprint arXiv:2310.02457

2023/10/3

Hannah Rose Kirk
Hannah Rose Kirk

H-Index: 1

Paul Röttger
Paul Röttger

H-Index: 1

Casteist but not racist? quantifying disparities in large language model bias between india and the west

arXiv preprint arXiv:2309.08573

2023/9/15

Hannah Rose Kirk
Hannah Rose Kirk

H-Index: 1

Xstest: A test suite for identifying exaggerated safety behaviours in large language models

arXiv preprint arXiv:2308.01263

2023/8/2

DoDo Learning: DOmain-DemOgraphic Transfer in Language Models for Detecting Abuse Targeted at Public Figures

arXiv preprint arXiv:2307.16811

2023/7/31

Auditing large language models: a three-layered approach

AI and Ethics

2023/5/30

Balancing the picture: Debiasing vision-language datasets with synthetic contrast sets

arXiv preprint arXiv:2305.15407

2023/5/24

Assessing language model deployment with risk cards

arXiv preprint arXiv:2303.18190

2023/3/31

SemEval-2023 task 10: explainable detection of online sexism

2023/3/7

Is More Data Better? Re-thinking the Importance of Efficiency in Abusive Language Detection with Transformers-Based Active Learning

arXiv preprint arXiv:2209.10193

2022/9/21

Hannah Rose Kirk
Hannah Rose Kirk

H-Index: 1

Tracking abuse on Twitter against football players in the 2021–22 Premier League Season

Available at SSRN 4403913

2022/8/2

Hannah Rose Kirk
Hannah Rose Kirk

H-Index: 1

Paul Röttger
Paul Röttger

H-Index: 1

Proceedings of the First Workshop on Dynamic Adversarial Data Collection

2022/7

See List of Professors in Hannah Rose Kirk University(University of Oxford)

Co-Authors

academic-engine