David Scott Krueger

David Scott Krueger

Université de Montréal

H-index: 19

North America-Canada

About David Scott Krueger

David Scott Krueger, With an exceptional h-index of 19 and a recent h-index of 19 (since 2020), a distinguished researcher at Université de Montréal, specializes in the field of AI Alignment, Deep Learning.

His recent articles reflect a diverse array of research interests and contributions to the field:

Affirmative Safety: An Approach to Risk Management for Advanced Ai

Visibility into AI Agents

Foundational challenges in assuring alignment and safety of large language models

Safety Cases: Justifying the Safety of Advanced AI Systems

A Generative Model of Symmetry Transformations

Thinker: Learning to Plan and Act

Black-Box Access is Insufficient for Rigorous AI Audits

Blockwise self-supervised learning at scale

David Scott Krueger Information

University

Position

PhD Student

Citations(all)

6667

Citations(since 2020)

6122

Cited By

2289

hIndex(all)

19

hIndex(since 2020)

19

i10Index(all)

25

i10Index(since 2020)

25

Email

University Profile Page

Université de Montréal

Google Scholar

View Google Scholar Profile

David Scott Krueger Skills & Research Interests

AI Alignment

Deep Learning

Top articles of David Scott Krueger

Title

Journal

Author(s)

Publication Date

Affirmative Safety: An Approach to Risk Management for Advanced Ai

Available at SSRN 4806274

Akash Wasil

Joshua Clymer

David Krueger

Emily Dardaman

Simeon Campos

...

2024/4/24

Visibility into AI Agents

arXiv preprint arXiv:2401.13138

Alan Chan

Carson Ezell

Max Kaufmann

Kevin Wei

Lewis Hammond

...

2024/1/23

Foundational challenges in assuring alignment and safety of large language models

arXiv preprint arXiv:2404.09932

Usman Anwar

Abulhair Saparov

Javier Rando

Daniel Paleka

Miles Turpin

...

2024/4/15

Safety Cases: Justifying the Safety of Advanced AI Systems

arXiv preprint arXiv:2403.10462

Joshua Clymer

Nick Gabrieli

David Krueger

Thomas Larsen

2024/3/15

A Generative Model of Symmetry Transformations

arXiv preprint arXiv:2403.01946

James Urquhart Allingham

Bruno Kacper Mlodozeniec

Shreyas Padhy

Javier Antorán

David Krueger

...

2024/3/4

Thinker: Learning to Plan and Act

Stephen Chung

Ivan Anokhin

David Krueger

2023/7/27

Black-Box Access is Insufficient for Rigorous AI Audits

arXiv preprint arXiv:2401.14446

Stephen Casper

Carson Ezell

Charlotte Siegmann

Noam Kolt

Taylor Lynn Curtis

...

2024/1/25

Blockwise self-supervised learning at scale

arXiv preprint arXiv:2302.01647

Shoaib Ahmed Siddiqui

David Krueger

Yann LeCun

Stéphane Deny

2023/2/3

Open problems and fundamental limitations of reinforcement learning from human feedback

arXiv preprint arXiv:2307.15217

Stephen Casper

Xander Davies

Claudia Shi

Thomas Krendl Gilbert

Jérémy Scheurer

...

2023/7/27

BaDLoss: Backdoor Detection via Loss Dynamics

Neel Alex

Shoaib Ahmed Siddiqui

Amartya Sanyal

David Krueger

2023/10/13

Goal Misgeneralization as Implicit Goal Conditioning

Diego Dorn

Neel Alex

David Krueger

2023/11/27

On the fragility of learned reward functions

arXiv preprint arXiv:2301.03652

Lev McKinney

Yawen Duan

David Krueger

Adam Gleave

2023/1/9

Mechanistic mode connectivity

Ekdeep Singh Lubana

Eric J Bigelow

Robert P Dick

David Krueger

Hidenori Tanaka

2023/7/3

Towards Meta-Models for Automated Interpretability

Lauro Langosco

Neel Alex

William Baker

David John Quarel

Herbie Bradley

...

2023/10/13

Hazards from Increasingly Accessible Fine-Tuning of Downloadable Foundation Models

arXiv preprint arXiv:2312.14751

Alan Chan

Ben Bucknall

Herbie Bradley

David Krueger

2023/12/22

Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks

arXiv preprint arXiv:2311.12786

Samyak Jain

Robert Kirk

Ekdeep Singh Lubana

Robert P Dick

Hidenori Tanaka

...

2023/11/21

Harms from increasingly agentic algorithmic systems

Alan Chan

Rebecca Salganik

Alva Markelius

Chris Pang

Nitarshan Rajkumar

...

2023/6/12

Reward model ensembles help mitigate overoptimization

arXiv preprint arXiv:2310.02743

Thomas Coste

Usman Anwar

Robert Kirk

David Krueger

2023/10/4

Characterizing manipulation from AI systems

EEAMO 2023

Micah Carroll*

Alan Chan*

Henry Ashton

David Krueger

2023/3/16

(Out-of-context) Meta-learning in Language Models

Dmitrii Krasheninnikov

Egor Krasheninnikov

Bruno Kacper Mlodozeniec

David Krueger

2023/12/12

See List of Professors in David Scott Krueger University(Université de Montréal)

Co-Authors

H-index: 227
Yoshua Bengio

Yoshua Bengio

Université de Montréal

H-index: 98
Aaron Courville

Aaron Courville

Université de Montréal

H-index: 46
Simon Lacoste-Julien

Simon Lacoste-Julien

Université de Montréal

H-index: 28
Asja Fischer

Asja Fischer

Ruhr-Universität Bochum

H-index: 16
Chin-Wei Huang

Chin-Wei Huang

Université de Montréal

H-index: 15
Emmanuel Bengio

Emmanuel Bengio

McGill University

academic-engine