Sanjeev Arora
Princeton University
H-index: 75
North America-United States
Top articles of Sanjeev Arora
Title | Journal | Author(s) | Publication Date |
---|---|---|---|
Fine-tuning language models with just forward passes | Advances in Neural Information Processing Systems | Sadhika Malladi Tianyu Gao Eshaan Nichani Alex Damian Jason D Lee | 2024/2/13 |
Why (and When) does Local SGD Generalize Better than SGD? | arXiv preprint arXiv:2303.01215 | Xinran Gu Kaifeng Lyu Longbo Huang Sanjeev Arora | 2023/3/2 |
A theory for emergence of complex skills in language models | arXiv preprint arXiv:2307.15936 | Sanjeev Arora Anirudh Goyal | 2023/7/29 |
Trainable transformer in transformer | arXiv preprint arXiv:2307.01189 | Abhishek Panigrahi Sadhika Malladi Mengzhou Xia Sanjeev Arora | 2023/7/3 |
Task-specific skill localization in fine-tuned language models | Abhishek Panigrahi Nikunj Saunshi Haoyu Zhao Sanjeev Arora | 2023/7/3 | |
A kernel-based view of language model fine-tuning | Sadhika Malladi Alexander Wettig Dingli Yu Danqi Chen Sanjeev Arora | 2023/7/3 | |
Understanding the generalization benefit of normalization layers: Sharpness reduction | Advances in Neural Information Processing Systems | Kaifeng Lyu Zhiyuan Li Sanjeev Arora | 2022/12/6 |
New Definitions and Evaluations for Saliency Methods: Staying Intrinsic, Complete and Sound | Advances in Neural Information Processing Systems | Arushi Gupta Nikunj Saunshi Dingli Yu Kaifeng Lyu Sanjeev Arora | 2022/12/6 |
On the SDEs and scaling rules for adaptive gradient algorithms | Advances in Neural Information Processing Systems | Sadhika Malladi Kaifeng Lyu Abhishek Panigrahi Sanjeev Arora | 2022/12/6 |
Understanding Influence Functions and Datamodels via Harmonic Analysis | Nikunj Saunshi Arushi Gupta Mark Braverman Sanjeev Arora | 2022/9/29 | |
Understanding contrastive learning requires incorporating inductive biases | Nikunj Saunshi Jordan Ash Surbhi Goel Dipendra Misra Cyril Zhang | 2022/6/28 | |
Understanding gradient descent on the edge of stability in deep learning | Sanjeev Arora Zhiyuan Li Abhishek Panigrahi | 2022/6/28 | |
What Happens after SGD Reaches Zero Loss?--A Mathematical Framework | arXiv preprint arXiv:2110.06914 | Zhiyuan Li Tianhao Wang Sanjeev Arora | 2021/10/13 |
Opening the Black Box of Deep Learning: Some Lessons and Take-aways | Sanjeev Arora | 2021/5/31 | |
Evaluating gradient inversion attacks and defenses in federated learning | Advances in Neural Information Processing Systems | Yangsibo Huang Samyak Gupta Zhao Song Kai Li Sanjeev Arora | 2021/12/6 |
Rip van Winkle's Razor: A Simple Estimate of Overfit to Test Data | arXiv preprint arXiv:2102.13189 | Sanjeev Arora Yi Zhang | 2021/2/25 |
Gradient descent on two-layer nets: Margin maximization and simplicity bias | Advances in Neural Information Processing Systems | Kaifeng Lyu Zhiyuan Li Runzhe Wang Sanjeev Arora | 2021/12/6 |
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs) | Advances in Neural Information Processing Systems | Zhiyuan Li Sadhika Malladi Sanjeev Arora | 2021/12/6 |
Technical perspective: Why don't today's deep nets overfit to their training data? | Communications of the ACM | Sanjeev Arora | 2021/2/22 |
On Predicting Generalization using GANs | arXiv preprint arXiv:2111.14212 | Yi Zhang Arushi Gupta Nikunj Saunshi Sanjeev Arora | 2021/11/28 |