ProfessorsProfessors of Université de MontréalDavid Scott Krueger

David Scott Krueger

Université de Montréal

H-index: 19

North America-Canada

About David Scott Krueger

David Scott Krueger, With an exceptional h-index of 19 and a recent h-index of 19 (since 2020), a distinguished researcher at Université de Montréal, specializes in the field of AI Alignment, Deep Learning.

His recent articles reflect a diverse array of research interests and contributions to the field:

Affirmative Safety: An Approach to Risk Management for Advanced Ai

Visibility into AI Agents

Foundational challenges in assuring alignment and safety of large language models

Safety Cases: Justifying the Safety of Advanced AI Systems

A Generative Model of Symmetry Transformations

Thinker: Learning to Plan and Act

Black-Box Access is Insufficient for Rigorous AI Audits

Blockwise self-supervised learning at scale

David Scott Krueger Information

University	Université de Montréal
Position	PhD Student
Citations(all)	6667
Citations(since 2020)	6122
Cited By	2289
hIndex(all)	19
hIndex(since 2020)	19
i10Index(all)	25
i10Index(since 2020)	25
Email	Access Email
University Profile Page	Université de Montréal
Google Scholar	View Google Scholar Profile

David Scott Krueger Skills & Research Interests

AI Alignment

Deep Learning

Top articles of David Scott Krueger

Title	Journal	Author(s)	Publication Date
Affirmative Safety: An Approach to Risk Management for Advanced Ai	Available at SSRN 4806274	Akash Wasil Joshua Clymer David Krueger Emily Dardaman Simeon Campos ...	2024/4/24
Visibility into AI Agents	arXiv preprint arXiv:2401.13138	Alan Chan Carson Ezell Max Kaufmann Kevin Wei Lewis Hammond ...	2024/1/23
Foundational challenges in assuring alignment and safety of large language models	arXiv preprint arXiv:2404.09932	Usman Anwar Abulhair Saparov Javier Rando Daniel Paleka Miles Turpin ...	2024/4/15
Safety Cases: Justifying the Safety of Advanced AI Systems	arXiv preprint arXiv:2403.10462	Joshua Clymer Nick Gabrieli David Krueger Thomas Larsen	2024/3/15
A Generative Model of Symmetry Transformations	arXiv preprint arXiv:2403.01946	James Urquhart Allingham Bruno Kacper Mlodozeniec Shreyas Padhy Javier Antorán David Krueger ...	2024/3/4
Thinker: Learning to Plan and Act		Stephen Chung Ivan Anokhin David Krueger	2023/7/27
Black-Box Access is Insufficient for Rigorous AI Audits	arXiv preprint arXiv:2401.14446	Stephen Casper Carson Ezell Charlotte Siegmann Noam Kolt Taylor Lynn Curtis ...	2024/1/25
Blockwise self-supervised learning at scale	arXiv preprint arXiv:2302.01647	Shoaib Ahmed Siddiqui David Krueger Yann LeCun Stéphane Deny	2023/2/3
Open problems and fundamental limitations of reinforcement learning from human feedback	arXiv preprint arXiv:2307.15217	Stephen Casper Xander Davies Claudia Shi Thomas Krendl Gilbert Jérémy Scheurer ...	2023/7/27
BaDLoss: Backdoor Detection via Loss Dynamics		Neel Alex Shoaib Ahmed Siddiqui Amartya Sanyal David Krueger	2023/10/13
Goal Misgeneralization as Implicit Goal Conditioning		Diego Dorn Neel Alex David Krueger	2023/11/27
On the fragility of learned reward functions	arXiv preprint arXiv:2301.03652	Lev McKinney Yawen Duan David Krueger Adam Gleave	2023/1/9
Mechanistic mode connectivity		Ekdeep Singh Lubana Eric J Bigelow Robert P Dick David Krueger Hidenori Tanaka	2023/7/3
Towards Meta-Models for Automated Interpretability		Lauro Langosco Neel Alex William Baker David John Quarel Herbie Bradley ...	2023/10/13
Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models	arXiv preprint arXiv:2312.14751	Alan Chan Ben Bucknall Herbie Bradley David Krueger	2023/12/22
Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks	arXiv preprint arXiv:2311.12786	Samyak Jain Robert Kirk Ekdeep Singh Lubana Robert P Dick Hidenori Tanaka ...	2023/11/21
Harms from increasingly agentic algorithmic systems		Alan Chan Rebecca Salganik Alva Markelius Chris Pang Nitarshan Rajkumar ...	2023/6/12
Reward model ensembles help mitigate overoptimization	arXiv preprint arXiv:2310.02743	Thomas Coste Usman Anwar Robert Kirk David Krueger	2023/10/4
Characterizing manipulation from AI systems	EEAMO 2023	Micah Carroll* Alan Chan* Henry Ashton David Krueger	2023/3/16
(Out-of-context) Meta-learning in Language Models		Dmitrii Krasheninnikov Egor Krasheninnikov Bruno Kacper Mlodozeniec David Krueger	2023/12/12