Paul Röttger
University of Oxford
H-index: 8
Europe-United Kingdom
Top articles of Paul Röttger
From Languages to Geographies: Towards Evaluating Cultural Bias in Hate Speech Datasets
arXiv preprint arXiv:2404.17874
2024/4/27
Diyi Liu
H-Index: 1
Paul Röttger
H-Index: 1
Near to Mid-term Risks and Opportunities of Open Source Generative AI
arXiv preprint arXiv:2404.17047
2024/4/25
The PRISM Alignment Project: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models
arXiv preprint arXiv:2404.16019
2024/4/24
Hannah Rose Kirk
H-Index: 1
Paul Röttger
H-Index: 1
Andrew Bean
H-Index: 2
Max Bartolo
H-Index: 3
He He
H-Index: 4
The benefits, risks and bounds of personalizing the alignment of large language models to individuals
2024/4/23
Hannah Rose Kirk
H-Index: 1
Paul Röttger
H-Index: 1
Introducing v0. 5 of the AI Safety Benchmark from MLCommons
arXiv preprint arXiv:2404.12241
2024/4/18
Cody Coleman
H-Index: 11
Surgan Jandial
H-Index: 3
Foutse Khomh
H-Index: 33
Hannah Rose Kirk
H-Index: 1
Michael Kuchnik
H-Index: 2
Chris Lengerich
H-Index: 3
Bo Li
H-Index: 27
Yifan Mai
H-Index: 5
Priyanka Mary Mammen
H-Index: 3
Shafee Mohammed
H-Index: 1
Alicia Parrish
H-Index: 2
Eleonora Presani
H-Index: 15
Paul Röttger
H-Index: 1
Elizabeth Anne Watkins
H-Index: 2
Poonam Yadav
H-Index: 6
Yi Zeng
H-Index: 5
Wenhui Zhang
H-Index: 10
Jiacheng Zhu
H-Index: 3
Percy Liang
H-Index: 55
Joaquin Vanschoren
H-Index: 27
Look at the Text: Instruction-Tuned Language Models are More Robust Multiple Choice Selectors than You Think
arXiv preprint arXiv:2404.08382
2024/4/12
Xinpeng Wang
H-Index: 3
Paul Röttger
H-Index: 1
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
2024/4/8
Paul Röttger
H-Index: 1
Dirk Hovy
H-Index: 26
Improving Adversarial Data Collection by Supporting Annotators: Lessons from GAHD, a German Hate Speech Dataset
NAACL 2024 (Main)
2024/3/28
Paul Röttger
H-Index: 1
Evaluating the Elementary Multilingual Capabilities of Large Language Models with MultiQ
arXiv preprint arXiv:2403.03814
2024/3/6
Paul Röttger
H-Index: 1
Anne Lauscher
H-Index: 10
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models
arXiv preprint arXiv:2402.16786
2024/2/26
"My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models
arXiv preprint arXiv:2402.14499
2024/2/22
SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models
arXiv preprint arXiv:2311.08370
2023/11/14
Hannah Rose Kirk
H-Index: 1
Paul Röttger
H-Index: 1
The Past, Present and Better Future of Feedback Learning in Large Language Models for Subjective Human Preferences and Values
arXiv preprint arXiv:2310.07629
2023/10/11
Hannah Rose Kirk
H-Index: 1
Paul Röttger
H-Index: 1
The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising “Alignment” in Large Language Models
arXiv preprint arXiv:2310.02457
2023/10/3
Hannah Rose Kirk
H-Index: 1
Paul Röttger
H-Index: 1
Safety-tuned llamas: Lessons from improving the safety of large language models that follow instructions
ICLR 2024
2023/9/14
XSTest: A Test Suite for Identifying Exaggerated Safety Behaviours in Large Language Models
2023/8/2
The Ecological Fallacy in Annotation: Modelling Human Label Variation goes beyond Sociodemographics
2023/6/20
Paul Röttger
H-Index: 1
Dirk Hovy
H-Index: 26
Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback
arXiv preprint arXiv:2303.05453
2023/3/9
Hannah Rose Kirk
H-Index: 1
Paul Röttger
H-Index: 1
SemEval-2023 Task 10: Explainable Detection of Online Sexism
2023/3/7
Tracking abuse on Twitter against football players in the 2021–22 Premier League Season
Available at SSRN 4403913
2022/8/2
Hannah Rose Kirk
H-Index: 1
Paul Röttger
H-Index: 1