ProfessorsProfessors of Stanford UniversityBeidi Chen

Beidi Chen

Stanford University

H-index: 16

North America-United States

About Beidi Chen

Beidi Chen, With an exceptional h-index of 16 and a recent h-index of 16 (since 2020), a distinguished researcher at Stanford University, specializes in the field of Machine Learning.

His recent articles reflect a diverse array of research interests and contributions to the field:

Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding

TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding

Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference

Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length

Laughing hyena distillery: Extracting compact recurrences from convolutions

Prompt-prompted Mixture of Experts for Efficient LLM Generation

Learn To be Efficient: Build Structured Sparsity in Large Language Models

Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding

Beidi Chen Information

University	Stanford University
Position	___
Citations(all)	801
Citations(since 2020)	756
Cited By	145
hIndex(all)	16
hIndex(since 2020)	16
i10Index(all)	20
i10Index(since 2020)	20
Email	Access Email
University Profile Page	Stanford University
Google Scholar	View Google Scholar Profile

Beidi Chen Skills & Research Interests

Machine Learning

Top articles of Beidi Chen

Title	Journal	Author(s)	Publication Date
Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding	arXiv preprint arXiv:2402.12374	Zhuoming Chen Avner May Ruslan Svirschevski Yuhsun Huang Max Ryabinin ...	2024/2/19
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding	arXiv preprint arXiv:2404.11912	Hanshi Sun Zhuoming Chen Xinyu Yang Yuandong Tian Beidi Chen	2024/4/18
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference	arXiv preprint arXiv:2402.09398	Harry Dong Xinyu Yang Zhenyu Zhang Zhangyang Wang Yuejie Chi ...	2024/2/14
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length	arXiv preprint arXiv:2404.08801	Xuezhe Ma Xiaomeng Yang Wenhan Xiong Beidi Chen Lili Yu ...	2024/4/12
Laughing hyena distillery: Extracting compact recurrences from convolutions	Advances in Neural Information Processing Systems	Stefano Massaroli Michael Poli Dan Fu Hermann Kumbong Rom Parnichkun ...	2024/2/13
Prompt-prompted Mixture of Experts for Efficient LLM Generation	arXiv preprint arXiv:2404.01365	Harry Dong Beidi Chen Yuejie Chi	2024/4/1
Learn To be Efficient: Build Structured Sparsity in Large Language Models	arXiv preprint arXiv:2402.06126	Haizhong Zheng Xiaoyan Bai Beidi Chen Fan Lai Atul Prakash	2024/2/9
Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding	arXiv preprint arXiv:2403.04797	Zhenyu Zhang Runjin Chen Shiwei Liu Zhewei Yao Olatunji Ruwase ...	2024/3/5
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache	arXiv preprint arXiv:2402.02750	Zirui Liu Jiayi Yuan Hongye Jin Shaochen Zhong Zhaozhuo Xu ...	2024/2/5
LLM Inference Unveiled: Survey and Roofline Model Insights	arXiv preprint arXiv:2402.16363	Zhihang Yuan Yuzhang Shang Yang Zhou Zhen Dong Chenhao Xue ...	2024/2/26
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer	International Conference on Machine Learning	Yuandong Tian Yiping Wang Beidi Chen Simon Du	2023/5/25
CocktailSGD: Fine-tuning foundation models over 500Mbps networks		Jue Wang Yucheng Lu Binhang Yuan Beidi Chen Percy Liang ...	2023/7/3
Joma: Demystifying multilayer transformers via joint dynamics of mlp and attention	(ICLR) International Conference on Learning Representations	Yuandong Tian Yiping Wang Zhenyu Zhang Beidi Chen Simon Du	2023/10/1
Compress, then prompt: Improving accuracy-efficiency trade-off of llm inference with transferable prompt	arXiv preprint arXiv:2305.11186	Zhaozhuo Xu Zirui Liu Beidi Chen Yuxin Tang Jue Wang ...	2023/5/17
Deja vu: Contextual sparsity for efficient llms at inference time		Zichang Liu Jue Wang Tri Dao Tianyi Zhou Binhang Yuan ...	2023/7/3
Efficient streaming language models with attention sinks	arXiv preprint arXiv:2309.17453	Guangxuan Xiao Yuandong Tian Beidi Chen Song Han Mike Lewis	2023/9/29
Sample-efficient Surrogate Model for Frequency Response of Linear PDEs using Self-Attentive Complex Polynomials	arXiv preprint arXiv:2301.02747	Andrew Cohen Weiping Dou Jiang Zhu Slawomir Koziel Peter Renner ...	2023/1/6
H O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models	International Conference on Machine Learning	Zhenyu Zhang Ying Sheng Tianyi Zhou Tianlong Chen Lianmin Zheng ...	2023/6/24
On the Similarity between Attention and SVM on the Token Separation and Selection Behavior		Beidi Chen Wentao Guo Zhihang Li Zhao Song Tianyi Zhou	2023/9/22
Inrank: Incremental low-rank learning	arXiv preprint arXiv:2306.11250	Jiawei Zhao Yifei Zhang Beidi Chen Florian Schäfer Anima Anandkumar	2023/6/20