Paul Röttger

Paul Röttger

University of Oxford

H-index: 8

Europe-United Kingdom

About Paul Röttger

Paul Röttger, With an exceptional h-index of 8 and a recent h-index of 8 (since 2020), a distinguished researcher at University of Oxford, specializes in the field of Natural Language Processing, Large Language Models, Online Harms, AI Safety.

His recent articles reflect a diverse array of research interests and contributions to the field:

From Languages to Geographies: Towards Evaluating Cultural Bias in Hate Speech Datasets

Near to Mid-term Risks and Opportunities of Open Source Generative AI

The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models

The benefits, risks and bounds of personalizing the alignment of large language models to individuals

Introducing v0. 5 of the AI Safety Benchmark from MLCommons

Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think

SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety

Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset

Paul Röttger Information

University

Position

DPhil Student

Citations(all)

447

Citations(since 2020)

447

Cited By

1

hIndex(all)

8

hIndex(since 2020)

8

i10Index(all)

7

i10Index(since 2020)

7

Email

University Profile Page

Google Scholar

Paul Röttger Skills & Research Interests

Natural Language Processing

Large Language Models

Online Harms

AI Safety

Top articles of Paul Röttger

From Languages to Geographies: Towards Evaluating Cultural Bias in Hate Speech Datasets

arXiv preprint arXiv:2404.17874

2024/4/27

Diyi Liu
Diyi Liu

H-Index: 1

Paul Röttger
Paul Röttger

H-Index: 1

Near to Mid-term Risks and Opportunities of Open Source Generative AI

arXiv preprint arXiv:2404.17047

2024/4/25

The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models

arXiv preprint arXiv:2404.16019

2024/4/24

The benefits, risks and bounds of personalizing the alignment of large language models to individuals

2024/4/23

Hannah Rose Kirk
Hannah Rose Kirk

H-Index: 1

Paul Röttger
Paul Röttger

H-Index: 1

Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think

arXiv preprint arXiv:2404.08382

2024/4/12

Xinpeng Wang
Xinpeng Wang

H-Index: 3

Paul Röttger
Paul Röttger

H-Index: 1

SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety

2024/4/8

Paul Röttger
Paul Röttger

H-Index: 1

Dirk Hovy
Dirk Hovy

H-Index: 26

Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset

NAACL 2024 (Main)

2024/3/28

Paul Röttger
Paul Röttger

H-Index: 1

Evaluating the Elementary Multilingual Capabilities of Large Language Models with MultiQ

arXiv preprint arXiv:2403.03814

2024/3/6

Paul Röttger
Paul Röttger

H-Index: 1

Anne Lauscher
Anne Lauscher

H-Index: 10

Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models

arXiv preprint arXiv:2402.16786

2024/2/26

"My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models

arXiv preprint arXiv:2402.14499

2024/2/22

SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models

arXiv preprint arXiv:2311.08370

2023/11/14

Hannah Rose Kirk
Hannah Rose Kirk

H-Index: 1

Paul Röttger
Paul Röttger

H-Index: 1

The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values

arXiv preprint arXiv:2310.07629

2023/10/11

Hannah Rose Kirk
Hannah Rose Kirk

H-Index: 1

Paul Röttger
Paul Röttger

H-Index: 1

The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising “Alignment” in Large Language Models

arXiv preprint arXiv:2310.02457

2023/10/3

Hannah Rose Kirk
Hannah Rose Kirk

H-Index: 1

Paul Röttger
Paul Röttger

H-Index: 1

Safety-tuned llamas: Lessons from improving the safety of large language models that follow instructions

ICLR 2024

2023/9/14

XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models

2023/8/2

The Ecological Fallacy in Annotation: Modelling Human Label Variation goes beyond Sociodemographics

2023/6/20

Paul Röttger
Paul Röttger

H-Index: 1

Dirk Hovy
Dirk Hovy

H-Index: 26

Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback

arXiv preprint arXiv:2303.05453

2023/3/9

Hannah Rose Kirk
Hannah Rose Kirk

H-Index: 1

Paul Röttger
Paul Röttger

H-Index: 1

SemEval-2023 Task 10: Explainable Detection of Online Sexism

2023/3/7

Tracking abuse on Twitter against football players in the 2021–22 Premier League Season

Available at SSRN 4403913

2022/8/2

Hannah Rose Kirk
Hannah Rose Kirk

H-Index: 1

Paul Röttger
Paul Röttger

H-Index: 1

See List of Professors in Paul Röttger University(University of Oxford)

Co-Authors

academic-engine