Paul Röttger at University of Oxford

University	University of Oxford
Position	DPhil Student
Citations(all)	447
Citations(since 2020)	447
Cited By	1
hIndex(all)	8
hIndex(since 2020)	8
i10Index(all)	7
i10Index(since 2020)	7
Email	Access Email
University Profile Page	University of Oxford
Google Scholar	View Google Scholar Profile

From Languages to Geographies: Towards Evaluating Cultural Bias in Hate Speech Datasets

arXiv preprint arXiv:2404.17874

2024/4/27

Diyi Liu

H-Index: 1

Paul Röttger

H-Index: 1

Near to Mid-term Risks and Opportunities of Open Source Generative AI

arXiv preprint arXiv:2404.17047

2024/4/25

Francisco Eiras

H-Index: 3

Christian Schroeder De Witt

H-Index: 6

Supratik Mukhopadhyay

H-Index: 8

Adel Bibi

H-Index: 10

Matthew Jackson

H-Index: 0

Paul Röttger

H-Index: 1

Trevor Darrell

H-Index: 101

Yong Suk Lee

H-Index: 8

The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models

arXiv preprint arXiv:2404.16019

2024/4/24

Hannah Rose Kirk

H-Index: 1

Paul Röttger

H-Index: 1

Andrew Bean

H-Index: 2

Max Bartolo

H-Index: 3

He He

H-Index: 4

The benefits, risks and bounds of personalizing the alignment of large language models to individuals

2024/4/23

Hannah Rose Kirk

H-Index: 1

Paul Röttger

H-Index: 1

Introducing v0. 5 of the AI Safety Benchmark from MLCommons

arXiv preprint arXiv:2404.12241

2024/4/18

Cody Coleman

H-Index: 11

Surgan Jandial

H-Index: 3

Foutse Khomh

H-Index: 33

Hannah Rose Kirk

H-Index: 1

Michael Kuchnik

H-Index: 2

Chris Lengerich

H-Index: 3

Bo Li

H-Index: 27

Yifan Mai

H-Index: 5

Priyanka Mary Mammen

H-Index: 3

Shafee Mohammed

H-Index: 1

Alicia Parrish

H-Index: 2

Eleonora Presani

H-Index: 15

Paul Röttger

H-Index: 1

Elizabeth Anne Watkins

H-Index: 2

Poonam Yadav

H-Index: 6

Yi Zeng

H-Index: 5

Wenhui Zhang

H-Index: 10

Jiacheng Zhu

H-Index: 3

Percy Liang

H-Index: 55

Joaquin Vanschoren

H-Index: 27

Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think

arXiv preprint arXiv:2404.08382

2024/4/12

Xinpeng Wang

H-Index: 3

Paul Röttger

H-Index: 1

SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety

2024/4/8

Paul Röttger

H-Index: 1

Dirk Hovy

H-Index: 26

Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset

NAACL 2024 (Main)

2024/3/28

Paul Röttger

H-Index: 1

Evaluating the Elementary Multilingual Capabilities of Large Language Models with MultiQ

arXiv preprint arXiv:2403.03814

2024/3/6

Paul Röttger

H-Index: 1

Anne Lauscher

H-Index: 10

Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models

arXiv preprint arXiv:2402.16786

2024/2/26

Paul Röttger

H-Index: 1

Valentin Hofmann

H-Index: 3

Valentina Pyatkin

H-Index: 1

Hannah Rose Kirk

H-Index: 1

Hinrich Schütze

H-Index: 48

Dirk Hovy

H-Index: 26

"My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models

arXiv preprint arXiv:2402.14499

2024/2/22

Xinpeng Wang

H-Index: 3

Paul Röttger

H-Index: 1

Frauke Kreuter

H-Index: 32

Dirk Hovy

H-Index: 26

SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models

arXiv preprint arXiv:2311.08370

2023/11/14

Hannah Rose Kirk

H-Index: 1

Paul Röttger

H-Index: 1

The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values

arXiv preprint arXiv:2310.07629

2023/10/11

Hannah Rose Kirk

H-Index: 1

Paul Röttger

H-Index: 1

The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising “Alignment” in Large Language Models

arXiv preprint arXiv:2310.02457

2023/10/3

Hannah Rose Kirk

H-Index: 1

Paul Röttger

H-Index: 1

Safety-tuned llamas: Lessons from improving the safety of large language models that follow instructions

ICLR 2024

2023/9/14

Federico Bianchi

H-Index: 4

Giuseppe Attanasio

H-Index: 2

Paul Röttger

H-Index: 1

Dan Jurafsky

H-Index: 71

Tatsunori Hashimoto

H-Index: 17

James Zou

H-Index: 38

XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models

2023/8/2

Paul Röttger

H-Index: 1

Hannah Rose Kirk

H-Index: 1

Giuseppe Attanasio

H-Index: 2

Federico Bianchi

H-Index: 4

Dirk Hovy

H-Index: 26

The Ecological Fallacy in Annotation: Modelling Human Label Variation goes beyond Sociodemographics

2023/6/20

Paul Röttger

H-Index: 1

Dirk Hovy

H-Index: 26

Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback

arXiv preprint arXiv:2303.05453

2023/3/9

Hannah Rose Kirk

H-Index: 1

Paul Röttger

H-Index: 1

SemEval-2023 Task 10: Explainable Detection of Online Sexism

2023/3/7

Hannah Rose Kirk

H-Index: 1

Wenjie Yin

H-Index: 3

Paul Röttger

H-Index: 1

Tracking abuse on Twitter against football players in the 2021–22 Premier League Season

Available at SSRN 4403913

2022/8/2

Hannah Rose Kirk

H-Index: 1

Paul Röttger

H-Index: 1

Paul Röttger

University of Oxford

About Paul Röttger

Paul Röttger Information

Paul Röttger Skills & Research Interests

Top articles of Paul Röttger

From Languages to Geographies: Towards Evaluating Cultural Bias in Hate Speech Datasets

Diyi Liu

Paul Röttger

Near to Mid-term Risks and Opportunities of Open Source Generative AI

Francisco Eiras

Christian Schroeder De Witt

Supratik Mukhopadhyay

Adel Bibi

Matthew Jackson

Paul Röttger

Trevor Darrell

Yong Suk Lee

The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models

Hannah Rose Kirk

Paul Röttger

Andrew Bean

Max Bartolo

He He

The benefits, risks and bounds of personalizing the alignment of large language models to individuals

Hannah Rose Kirk

Paul Röttger

Introducing v0. 5 of the AI Safety Benchmark from MLCommons

Cody Coleman

Surgan Jandial

Foutse Khomh

Hannah Rose Kirk

Michael Kuchnik

Chris Lengerich

Bo Li

Yifan Mai

Priyanka Mary Mammen

Shafee Mohammed

Alicia Parrish

Eleonora Presani

Paul Röttger

Elizabeth Anne Watkins

Poonam Yadav

Yi Zeng

Wenhui Zhang

Jiacheng Zhu

Percy Liang

Joaquin Vanschoren

Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think

Xinpeng Wang

Paul Röttger

SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety

Paul Röttger

Dirk Hovy

Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset

Paul Röttger

Evaluating the Elementary Multilingual Capabilities of Large Language Models with MultiQ

Paul Röttger

Anne Lauscher

Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models

Paul Röttger

Valentin Hofmann

Valentina Pyatkin

Hannah Rose Kirk

Hinrich Schütze

Dirk Hovy

"My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models

Xinpeng Wang

Paul Röttger

Frauke Kreuter

Dirk Hovy

SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models

Hannah Rose Kirk

Paul Röttger

The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values

Hannah Rose Kirk

Paul Röttger

The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising “Alignment” in Large Language Models

Hannah Rose Kirk

Paul Röttger