ProfessorsProfessors of University of California, BerkeleyAmir Gholami

Amir Gholami

University of California, Berkeley

H-index: 36

North America-United States

About Amir Gholami

Amir Gholami, With an exceptional h-index of 36 and a recent h-index of 36 (since 2020), a distinguished researcher at University of California, Berkeley, specializes in the field of Machine Learning Systems, High Performance Computing, Parallel Algorithms, Natural Language Processing.

His recent articles reflect a diverse array of research interests and contributions to the field:

LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

An LLM Compiler for Parallel Function Calling

SPEED: Speculative Pipelined Execution for Efficient Decoding

SqueezeLLM: Dense-and-Sparse Quantization

Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior

End-to-end codesign of Hessian-aware quantized neural networks for FPGAs and ASICs

Full Stack Optimization of Transformer Inference: a Survey

Amir Gholami Information

University	University of California, Berkeley
Position	Research Scientist
Citations(all)	7830
Citations(since 2020)	7634
Cited By	1867
hIndex(all)	36
hIndex(since 2020)	36
i10Index(all)	51
i10Index(since 2020)	49
Email	Access Email
University Profile Page	University of California, Berkeley
Google Scholar	View Google Scholar Profile

Amir Gholami Skills & Research Interests

Machine Learning Systems

High Performance Computing

Parallel Algorithms

Natural Language Processing

Top articles of Amir Gholami

Title	Journal	Author(s)	Publication Date
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement	arXiv preprint arXiv:2403.15042	Nicholas Lee Thanakul Wattanawong Sehoon Kim Karttikeya Mangalam Sheng Shen ...	2024/3/22
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization	arXiv preprint arXiv:2401.18079	Coleman Hooper Sehoon Kim Hiva Mohammadzadeh Michael W Mahoney Yakun Sophia Shao ...	2024/1/31
An LLM Compiler for Parallel Function Calling	arXiv preprint arXiv:2312.04511	Sehoon Kim Suhong Moon Ryan Tabrizi Nicholas Lee Michael W Mahoney ...	2023/12/7
SPEED: Speculative Pipelined Execution for Efficient Decoding	arXiv preprint arXiv:2310.12072	Coleman Hooper Sehoon Kim Hiva Mohammadzadeh Hasan Genc Kurt Keutzer ...	2023/10/18
SqueezeLLM: Dense-and-Sparse Quantization	arXiv preprint arXiv:2306.07629	Sehoon Kim* Coleman Hooper* Amir Gholami* Zhen Dong Xiuyu Li ...	2023/6/13
Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior	Advances in Neural Information Processing Systems	Shashank Subramanian Peter Harrington Kurt Keutzer Wahid Bhimji Dmitriy Morozov ...	2024/2/13
End-to-end codesign of Hessian-aware quantized neural networks for FPGAs and ASICs	arXiv preprint arXiv:2304.06745	Javier Campos Zhen Dong Javier Duarte Amir Gholami Michael W Mahoney ...	2023/4/13
Full Stack Optimization of Transformer Inference: a Survey	arXiv preprint arXiv:2302.14017	Sehoon Kim Coleman Hooper Thanakul Wattanawong Minwoo Kang Ruohan Yan ...	2023/2/27
Hessian-aware pruning and optimal neural implant		Shixing Yu Zhewei Yao Amir Gholami Zhen Dong Sehoon Kim ...	2022
Adaptive Self-supervision Algorithms for Physics-informed Neural Networks	arXiv preprint arXiv:2207.04084	Shashank Subramanian Robert M Kirby Michael W Mahoney Amir Gholami	2022/7/8
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition	Advances in Neural Information Processing Systems	Sehoon Kim Amir Gholami Albert Shaw Nicholas Lee Karttikeya Mangalam ...	2022/12/6
Integer-Only Zero-Shot Quantization for Efficient Speech Recognition		Sehoon Kim Amir Gholami Zhewei Yao Nicholas Lee Patrick Wang ...	2022/5/23
A Fast Post-Training Pruning Framework for Transformers	Advances in Neural Information Processing Systems	Woosuk Kwon Sehoon Kim Michael W Mahoney Joseph Hassoun Kurt Keutzer ...	2022/12/6
Applications and techniques for fast machine learning in science		Allison McCarn Deiana Nhan Tran Joshua Agar Michaela Blott Giuseppe Di Guglielmo ...	2022/4/12
HAWQ-V3: Dyadic Neural Network Quantization		Zhewei Yao Zhen Dong Zhangcheng Zheng Amir Gholami Jiali Yu ...	2021/7/1
Adahessian: An adaptive second order optimizer for machine learning	AAAI 2021	Zhewei Yao Amir Gholami Sheng Shen Kurt Keutzer Michael W Mahoney	2020/6/1
Characterizing possible failure modes in physics-informed neural networks	Advances in Neural Information Processing Systems	Aditi Krishnapriyan Amir Gholami Shandian Zhe Robert Kirby Michael W Mahoney	2021/12/6
A Survey of Quantization Methods for Efficient Neural Network Inference		Amir Gholami Sehoon Kim Zhen Dong Zhewei Yao Michael W Mahoney ...	2022/2/22
Learned token pruning for transformers	KDD 2022	Sehoon Kim* Sheng Shen* David Thorsley Amir Gholami Woosuk Kwon ...	2021/7/2
AI and Memory Wall	IEEE Micro	Amir Gholami Zhewei Yao Sehoon Kim Coleman Hooper Michael W Mahoney ...	2024/3/25