Amir Gholami
University of California, Berkeley
H-index: 36
North America-United States
Top articles of Amir Gholami
Title | Journal | Author(s) | Publication Date |
---|---|---|---|
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement | arXiv preprint arXiv:2403.15042 | Nicholas Lee Thanakul Wattanawong Sehoon Kim Karttikeya Mangalam Sheng Shen | 2024/3/22 |
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization | arXiv preprint arXiv:2401.18079 | Coleman Hooper Sehoon Kim Hiva Mohammadzadeh Michael W Mahoney Yakun Sophia Shao | 2024/1/31 |
An LLM Compiler for Parallel Function Calling | arXiv preprint arXiv:2312.04511 | Sehoon Kim Suhong Moon Ryan Tabrizi Nicholas Lee Michael W Mahoney | 2023/12/7 |
SPEED: Speculative Pipelined Execution for Efficient Decoding | arXiv preprint arXiv:2310.12072 | Coleman Hooper Sehoon Kim Hiva Mohammadzadeh Hasan Genc Kurt Keutzer | 2023/10/18 |
SqueezeLLM: Dense-and-Sparse Quantization | arXiv preprint arXiv:2306.07629 | Sehoon Kim* Coleman Hooper* Amir Gholami* Zhen Dong Xiuyu Li | 2023/6/13 |
Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior | Advances in Neural Information Processing Systems | Shashank Subramanian Peter Harrington Kurt Keutzer Wahid Bhimji Dmitriy Morozov | 2024/2/13 |
End-to-end codesign of Hessian-aware quantized neural networks for FPGAs and ASICs | arXiv preprint arXiv:2304.06745 | Javier Campos Zhen Dong Javier Duarte Amir Gholami Michael W Mahoney | 2023/4/13 |
Full Stack Optimization of Transformer Inference: a Survey | arXiv preprint arXiv:2302.14017 | Sehoon Kim Coleman Hooper Thanakul Wattanawong Minwoo Kang Ruohan Yan | 2023/2/27 |
Hessian-aware pruning and optimal neural implant | Shixing Yu Zhewei Yao Amir Gholami Zhen Dong Sehoon Kim | 2022 | |
Adaptive Self-supervision Algorithms for Physics-informed Neural Networks | arXiv preprint arXiv:2207.04084 | Shashank Subramanian Robert M Kirby Michael W Mahoney Amir Gholami | 2022/7/8 |
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition | Advances in Neural Information Processing Systems | Sehoon Kim Amir Gholami Albert Shaw Nicholas Lee Karttikeya Mangalam | 2022/12/6 |
Integer-Only Zero-Shot Quantization for Efficient Speech Recognition | Sehoon Kim Amir Gholami Zhewei Yao Nicholas Lee Patrick Wang | 2022/5/23 | |
A Fast Post-Training Pruning Framework for Transformers | Advances in Neural Information Processing Systems | Woosuk Kwon Sehoon Kim Michael W Mahoney Joseph Hassoun Kurt Keutzer | 2022/12/6 |
Applications and techniques for fast machine learning in science | Allison McCarn Deiana Nhan Tran Joshua Agar Michaela Blott Giuseppe Di Guglielmo | 2022/4/12 | |
HAWQ-V3: Dyadic Neural Network Quantization | Zhewei Yao Zhen Dong Zhangcheng Zheng Amir Gholami Jiali Yu | 2021/7/1 | |
Adahessian: An adaptive second order optimizer for machine learning | AAAI 2021 | Zhewei Yao Amir Gholami Sheng Shen Kurt Keutzer Michael W Mahoney | 2020/6/1 |
Characterizing possible failure modes in physics-informed neural networks | Advances in Neural Information Processing Systems | Aditi Krishnapriyan Amir Gholami Shandian Zhe Robert Kirby Michael W Mahoney | 2021/12/6 |
A Survey of Quantization Methods for Efficient Neural Network Inference | Amir Gholami Sehoon Kim Zhen Dong Zhewei Yao Michael W Mahoney | 2022/2/22 | |
Learned token pruning for transformers | KDD 2022 | Sehoon Kim* Sheng Shen* David Thorsley Amir Gholami Woosuk Kwon | 2021/7/2 |
AI and Memory Wall | IEEE Micro | Amir Gholami Zhewei Yao Sehoon Kim Coleman Hooper Michael W Mahoney | 2024/3/25 |