Amir Gholami

Amir Gholami

University of California, Berkeley

H-index: 36

North America-United States

About Amir Gholami

Amir Gholami, With an exceptional h-index of 36 and a recent h-index of 36 (since 2020), a distinguished researcher at University of California, Berkeley, specializes in the field of Machine Learning Systems, High Performance Computing, Parallel Algorithms, Natural Language Processing.

His recent articles reflect a diverse array of research interests and contributions to the field:

LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

An LLM Compiler for Parallel Function Calling

SPEED: Speculative Pipelined Execution for Efficient Decoding

SqueezeLLM: Dense-and-Sparse Quantization

Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior

End-to-end codesign of Hessian-aware quantized neural networks for FPGAs and ASICs

Full Stack Optimization of Transformer Inference: a Survey

Amir Gholami Information

University

Position

Research Scientist

Citations(all)

7830

Citations(since 2020)

7634

Cited By

1867

hIndex(all)

36

hIndex(since 2020)

36

i10Index(all)

51

i10Index(since 2020)

49

Email

University Profile Page

University of California, Berkeley

Google Scholar

View Google Scholar Profile

Amir Gholami Skills & Research Interests

Machine Learning Systems

High Performance Computing

Parallel Algorithms

Natural Language Processing

Top articles of Amir Gholami

Title

Journal

Author(s)

Publication Date

LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement

arXiv preprint arXiv:2403.15042

Nicholas Lee

Thanakul Wattanawong

Sehoon Kim

Karttikeya Mangalam

Sheng Shen

...

2024/3/22

KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

arXiv preprint arXiv:2401.18079

Coleman Hooper

Sehoon Kim

Hiva Mohammadzadeh

Michael W Mahoney

Yakun Sophia Shao

...

2024/1/31

An LLM Compiler for Parallel Function Calling

arXiv preprint arXiv:2312.04511

Sehoon Kim

Suhong Moon

Ryan Tabrizi

Nicholas Lee

Michael W Mahoney

...

2023/12/7

SPEED: Speculative Pipelined Execution for Efficient Decoding

arXiv preprint arXiv:2310.12072

Coleman Hooper

Sehoon Kim

Hiva Mohammadzadeh

Hasan Genc

Kurt Keutzer

...

2023/10/18

SqueezeLLM: Dense-and-Sparse Quantization

arXiv preprint arXiv:2306.07629

Sehoon Kim*

Coleman Hooper*

Amir Gholami*

Zhen Dong

Xiuyu Li

...

2023/6/13

Towards Foundation Models for Scientific Machine Learning: Characterizing Scaling and Transfer Behavior

Advances in Neural Information Processing Systems

Shashank Subramanian

Peter Harrington

Kurt Keutzer

Wahid Bhimji

Dmitriy Morozov

...

2024/2/13

End-to-end codesign of Hessian-aware quantized neural networks for FPGAs and ASICs

arXiv preprint arXiv:2304.06745

Javier Campos

Zhen Dong

Javier Duarte

Amir Gholami

Michael W Mahoney

...

2023/4/13

Full Stack Optimization of Transformer Inference: a Survey

arXiv preprint arXiv:2302.14017

Sehoon Kim

Coleman Hooper

Thanakul Wattanawong

Minwoo Kang

Ruohan Yan

...

2023/2/27

Hessian-aware pruning and optimal neural implant

Shixing Yu

Zhewei Yao

Amir Gholami

Zhen Dong

Sehoon Kim

...

2022

Adaptive Self-supervision Algorithms for Physics-informed Neural Networks

arXiv preprint arXiv:2207.04084

Shashank Subramanian

Robert M Kirby

Michael W Mahoney

Amir Gholami

2022/7/8

Squeezeformer: An Efficient Transformer for Automatic Speech Recognition

Advances in Neural Information Processing Systems

Sehoon Kim

Amir Gholami

Albert Shaw

Nicholas Lee

Karttikeya Mangalam

...

2022/12/6

Integer-Only Zero-Shot Quantization for Efficient Speech Recognition

Sehoon Kim

Amir Gholami

Zhewei Yao

Nicholas Lee

Patrick Wang

...

2022/5/23

A Fast Post-Training Pruning Framework for Transformers

Advances in Neural Information Processing Systems

Woosuk Kwon

Sehoon Kim

Michael W Mahoney

Joseph Hassoun

Kurt Keutzer

...

2022/12/6

Applications and techniques for fast machine learning in science

Allison McCarn Deiana

Nhan Tran

Joshua Agar

Michaela Blott

Giuseppe Di Guglielmo

...

2022/4/12

HAWQ-V3: Dyadic Neural Network Quantization

Zhewei Yao

Zhen Dong

Zhangcheng Zheng

Amir Gholami

Jiali Yu

...

2021/7/1

Adahessian: An adaptive second order optimizer for machine learning

AAAI 2021

Zhewei Yao

Amir Gholami

Sheng Shen

Kurt Keutzer

Michael W Mahoney

2020/6/1

Characterizing possible failure modes in physics-informed neural networks

Advances in Neural Information Processing Systems

Aditi Krishnapriyan

Amir Gholami

Shandian Zhe

Robert Kirby

Michael W Mahoney

2021/12/6

A Survey of Quantization Methods for Efficient Neural Network Inference

Amir Gholami

Sehoon Kim

Zhen Dong

Zhewei Yao

Michael W Mahoney

...

2022/2/22

Learned token pruning for transformers

KDD 2022

Sehoon Kim*

Sheng Shen*

David Thorsley

Amir Gholami

Woosuk Kwon

...

2021/7/2

AI and Memory Wall

IEEE Micro

Amir Gholami

Zhewei Yao

Sehoon Kim

Coleman Hooper

Michael W Mahoney

...

2024/3/25

See List of Professors in Amir Gholami University(University of California, Berkeley)

Co-Authors

H-index: 154
Pieter Abbeel

Pieter Abbeel

University of California, Berkeley

H-index: 153
Ion Stoica

Ion Stoica

University of California, Berkeley

H-index: 101
Kurt Keutzer

Kurt Keutzer

University of California, Berkeley

H-index: 75
Michael Mahoney

Michael Mahoney

University of California, Berkeley

H-index: 68
Joseph E. Gonzalez

Joseph E. Gonzalez

University of California, Berkeley

H-index: 27
Bichen Wu

Bichen Wu

University of California, Berkeley

academic-engine