Beidi Chen
Stanford University
H-index: 16
North America-United States
Top articles of Beidi Chen
Title | Journal | Author(s) | Publication Date |
---|---|---|---|
Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding | arXiv preprint arXiv:2402.12374 | Zhuoming Chen Avner May Ruslan Svirschevski Yuhsun Huang Max Ryabinin | 2024/2/19 |
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding | arXiv preprint arXiv:2404.11912 | Hanshi Sun Zhuoming Chen Xinyu Yang Yuandong Tian Beidi Chen | 2024/4/18 |
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference | arXiv preprint arXiv:2402.09398 | Harry Dong Xinyu Yang Zhenyu Zhang Zhangyang Wang Yuejie Chi | 2024/2/14 |
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length | arXiv preprint arXiv:2404.08801 | Xuezhe Ma Xiaomeng Yang Wenhan Xiong Beidi Chen Lili Yu | 2024/4/12 |
Laughing hyena distillery: Extracting compact recurrences from convolutions | Advances in Neural Information Processing Systems | Stefano Massaroli Michael Poli Dan Fu Hermann Kumbong Rom Parnichkun | 2024/2/13 |
Prompt-prompted Mixture of Experts for Efficient LLM Generation | arXiv preprint arXiv:2404.01365 | Harry Dong Beidi Chen Yuejie Chi | 2024/4/1 |
Learn To be Efficient: Build Structured Sparsity in Large Language Models | arXiv preprint arXiv:2402.06126 | Haizhong Zheng Xiaoyan Bai Beidi Chen Fan Lai Atul Prakash | 2024/2/9 |
Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding | arXiv preprint arXiv:2403.04797 | Zhenyu Zhang Runjin Chen Shiwei Liu Zhewei Yao Olatunji Ruwase | 2024/3/5 |
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache | arXiv preprint arXiv:2402.02750 | Zirui Liu Jiayi Yuan Hongye Jin Shaochen Zhong Zhaozhuo Xu | 2024/2/5 |
LLM Inference Unveiled: Survey and Roofline Model Insights | arXiv preprint arXiv:2402.16363 | Zhihang Yuan Yuzhang Shang Yang Zhou Zhen Dong Chenhao Xue | 2024/2/26 |
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer | International Conference on Machine Learning | Yuandong Tian Yiping Wang Beidi Chen Simon Du | 2023/5/25 |
CocktailSGD: Fine-tuning foundation models over 500Mbps networks | Jue Wang Yucheng Lu Binhang Yuan Beidi Chen Percy Liang | 2023/7/3 | |
Joma: Demystifying multilayer transformers via joint dynamics of mlp and attention | (ICLR) International Conference on Learning Representations | Yuandong Tian Yiping Wang Zhenyu Zhang Beidi Chen Simon Du | 2023/10/1 |
Compress, then prompt: Improving accuracy-efficiency trade-off of llm inference with transferable prompt | arXiv preprint arXiv:2305.11186 | Zhaozhuo Xu Zirui Liu Beidi Chen Yuxin Tang Jue Wang | 2023/5/17 |
Deja vu: Contextual sparsity for efficient llms at inference time | Zichang Liu Jue Wang Tri Dao Tianyi Zhou Binhang Yuan | 2023/7/3 | |
Efficient streaming language models with attention sinks | arXiv preprint arXiv:2309.17453 | Guangxuan Xiao Yuandong Tian Beidi Chen Song Han Mike Lewis | 2023/9/29 |
Sample-efficient Surrogate Model for Frequency Response of Linear PDEs using Self-Attentive Complex Polynomials | arXiv preprint arXiv:2301.02747 | Andrew Cohen Weiping Dou Jiang Zhu Slawomir Koziel Peter Renner | 2023/1/6 |
H O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models | International Conference on Machine Learning | Zhenyu Zhang Ying Sheng Tianyi Zhou Tianlong Chen Lianmin Zheng | 2023/6/24 |
On the Similarity between Attention and SVM on the Token Separation and Selection Behavior | Beidi Chen Wentao Guo Zhihang Li Zhao Song Tianyi Zhou | 2023/9/22 | |
Inrank: Incremental low-rank learning | arXiv preprint arXiv:2306.11250 | Jiawei Zhao Yifei Zhang Beidi Chen Florian Schäfer Anima Anandkumar | 2023/6/20 |