Suvrit Sra
Massachusetts Institute of Technology
H-index: 65
North America-United States
Top articles of Suvrit Sra
Title | Journal | Author(s) | Publication Date |
---|---|---|---|
The crucial role of normalization in sharpness-aware minimization | Advances in Neural Information Processing Systems | Yan Dai Kwangjun Ahn Suvrit Sra | 2024/2/13 |
Transformers learn to implement preconditioned gradient descent for in-context learning | Advances in Neural Information Processing Systems | Kwangjun Ahn Xiang Cheng Hadi Daneshmand Suvrit Sra | 2024/2/13 |
Invex Programs: First Order Algorithms and Their Convergence | arXiv preprint arXiv:2307.04456 | Adarsh Barik Suvrit Sra Jean Honorio | 2023/7/10 |
Sion’s minimax theorem in geodesic metric spaces and a Riemannian extragradient algorithm | SIAM Journal on Optimization | Peiyuan Zhang Jingzhao Zhang Suvrit Sra | 2023/12/31 |
Global optimality for Euclidean CCCP under Riemannian convexity | Melanie Weber Suvrit Sra | 2023/7 | |
Toward Understanding State Representation Learning in MuZero: A Case Study in Linear Quadratic Gaussian Control | Yi Tian Kaiqing Zhang Russ Tedrake Suvrit Sra | 2023/12/13 | |
On the training instability of shuffling SGD with batch normalization | David X Wu Chulhee Yun Suvrit Sra | 2023/2/24 | |
Transformers implement functional gradient descent to learn non-linear functions in context | arXiv preprint arXiv:2312.06528 | Xiang Cheng Yuxin Chen Suvrit Sra | 2023/12/11 |
Can Direct Latent Model Learning Solve Linear Quadratic Gaussian Control? | Yi Tian Kaiqing Zhang Russ Tedrake Suvrit Sra | 2022/12/30 | |
Functions with Positive Differences on Convex Cones | Results in Mathematics | Constantin P Niculescu Suvrit Sra | 2023/12 |
How to escape sharp minima | arXiv preprint arXiv:2305.15659 | Kwangjun Ahn Ali Jadbabaie Suvrit Sra | 2023/5/25 |
Linear attention is (maybe) all you need (to understand transformer optimization) | Kwangjun Ahn Xiang Cheng Minhak Song Chulhee Yun Ali Jadbabaie | 2023/10/2 | |
Theory and algorithms for diffusion processes on riemannian manifolds | arXiv preprint arXiv:2204.13665 | Xiang Cheng Jingzhao Zhang Suvrit Sra | 2022/4/28 |
Understanding Riemannian acceleration via a proximal extragradient framework | Jikai Jin Suvrit Sra | 2022/6/28 | |
Efficient sampling on Riemannian manifolds via Langevin MCMC | Advances in Neural Information Processing Systems | Xiang Cheng Jingzhao Zhang Suvrit Sra | 2022/12/6 |
Sign and basis invariant networks for spectral graph representation learning | ICLR 2023 | Derek Lim Joshua Robinson Lingxiao Zhao Tess Smidt | 2022 |
Understanding the unstable convergence of gradient descent | ICML 2022 (arXiv:2204.01050) | Kwangjun Ahn Jingzhao Zhang Suvrit Sra | 2022/4/3 |
CCCP is Frank-Wolfe in disguise | Advances in Neural Information Processing Systems | Alp Yurtsever Suvrit Sra | 2022/12/6 |
Understanding Nesterov's Acceleration via Proximal Point Method | Kwangjun Ahn Suvrit Sra | 2022 | |
Max-margin contrastive learning | Proceedings of the AAAI Conference on Artificial Intelligence | Anshul Shah Suvrit Sra Rama Chellappa Anoop Cherian | 2022/6/28 |