Suvrit Sra

Suvrit Sra

Massachusetts Institute of Technology

H-index: 65

North America-United States

About Suvrit Sra

Suvrit Sra, With an exceptional h-index of 65 and a recent h-index of 50 (since 2020), a distinguished researcher at Massachusetts Institute of Technology, specializes in the field of Nonconvex Optimization, Deep Learning Theory, Matrix Analysis, Geometric Optimization.

His recent articles reflect a diverse array of research interests and contributions to the field:

The crucial role of normalization in sharpness-aware minimization

Transformers learn to implement preconditioned gradient descent for in-context learning

Invex Programs: First Order Algorithms and Their Convergence

Sion’s minimax theorem in geodesic metric spaces and a Riemannian extragradient algorithm

Global optimality for Euclidean CCCP under Riemannian convexity

Toward Understanding State Representation Learning in MuZero: A Case Study in Linear Quadratic Gaussian Control

On the training instability of shuffling SGD with batch normalization

Transformers implement functional gradient descent to learn non-linear functions in context

Suvrit Sra Information

University

Position

___

Citations(all)

16962

Citations(since 2020)

9417

Cited By

11191

hIndex(all)

65

hIndex(since 2020)

50

i10Index(all)

145

i10Index(since 2020)

117

Email

University Profile Page

Massachusetts Institute of Technology

Google Scholar

View Google Scholar Profile

Suvrit Sra Skills & Research Interests

Nonconvex Optimization

Deep Learning Theory

Matrix Analysis

Geometric Optimization

Top articles of Suvrit Sra

Title

Journal

Author(s)

Publication Date

The crucial role of normalization in sharpness-aware minimization

Advances in Neural Information Processing Systems

Yan Dai

Kwangjun Ahn

Suvrit Sra

2024/2/13

Transformers learn to implement preconditioned gradient descent for in-context learning

Advances in Neural Information Processing Systems

Kwangjun Ahn

Xiang Cheng

Hadi Daneshmand

Suvrit Sra

2024/2/13

Invex Programs: First Order Algorithms and Their Convergence

arXiv preprint arXiv:2307.04456

Adarsh Barik

Suvrit Sra

Jean Honorio

2023/7/10

Sion’s minimax theorem in geodesic metric spaces and a Riemannian extragradient algorithm

SIAM Journal on Optimization

Peiyuan Zhang

Jingzhao Zhang

Suvrit Sra

2023/12/31

Global optimality for Euclidean CCCP under Riemannian convexity

Melanie Weber

Suvrit Sra

2023/7

Toward Understanding State Representation Learning in MuZero: A Case Study in Linear Quadratic Gaussian Control

Yi Tian

Kaiqing Zhang

Russ Tedrake

Suvrit Sra

2023/12/13

On the training instability of shuffling SGD with batch normalization

David X Wu

Chulhee Yun

Suvrit Sra

2023/2/24

Transformers implement functional gradient descent to learn non-linear functions in context

arXiv preprint arXiv:2312.06528

Xiang Cheng

Yuxin Chen

Suvrit Sra

2023/12/11

Can Direct Latent Model Learning Solve Linear Quadratic Gaussian Control?

Yi Tian

Kaiqing Zhang

Russ Tedrake

Suvrit Sra

2022/12/30

Functions with Positive Differences on Convex Cones

Results in Mathematics

Constantin P Niculescu

Suvrit Sra

2023/12

How to escape sharp minima

arXiv preprint arXiv:2305.15659

Kwangjun Ahn

Ali Jadbabaie

Suvrit Sra

2023/5/25

Linear attention is (maybe) all you need (to understand transformer optimization)

Kwangjun Ahn

Xiang Cheng

Minhak Song

Chulhee Yun

Ali Jadbabaie

...

2023/10/2

Theory and algorithms for diffusion processes on riemannian manifolds

arXiv preprint arXiv:2204.13665

Xiang Cheng

Jingzhao Zhang

Suvrit Sra

2022/4/28

Understanding Riemannian acceleration via a proximal extragradient framework

Jikai Jin

Suvrit Sra

2022/6/28

Efficient sampling on Riemannian manifolds via Langevin MCMC

Advances in Neural Information Processing Systems

Xiang Cheng

Jingzhao Zhang

Suvrit Sra

2022/12/6

Sign and basis invariant networks for spectral graph representation learning

ICLR 2023

Derek Lim

Joshua Robinson

Lingxiao Zhao

Tess Smidt

2022

Understanding the unstable convergence of gradient descent

ICML 2022 (arXiv:2204.01050)

Kwangjun Ahn

Jingzhao Zhang

Suvrit Sra

2022/4/3

CCCP is Frank-Wolfe in disguise

Advances in Neural Information Processing Systems

Alp Yurtsever

Suvrit Sra

2022/12/6

Understanding Nesterov's Acceleration via Proximal Point Method

Kwangjun Ahn

Suvrit Sra

2022

Max-margin contrastive learning

Proceedings of the AAAI Conference on Artificial Intelligence

Anshul Shah

Suvrit Sra

Rama Chellappa

Anoop Cherian

2022/6/28

See List of Professors in Suvrit Sra University(Massachusetts Institute of Technology)

Co-Authors

H-index: 71
Ali Jadbabaie

Ali Jadbabaie

Massachusetts Institute of Technology

H-index: 70
Stephen Wright

Stephen Wright

University of Wisconsin-Madison

H-index: 68
Barnabas Poczos

Barnabas Poczos

Carnegie Mellon University

H-index: 63
Arindam Banerjee

Arindam Banerjee

University of Illinois at Urbana-Champaign

H-index: 49
Stefanie Jegelka

Stefanie Jegelka

Massachusetts Institute of Technology

H-index: 38
Stefan Harmeling

Stefan Harmeling

Heinrich-Heine-Universität Düsseldorf

academic-engine