Beidi Chen
Stanford University
H-index: 16
North America-United States
Top articles of Beidi Chen
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
arXiv preprint arXiv:2404.11912
2024/4/18
Xinyu Yang
H-Index: 2
Beidi Chen
H-Index: 7
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length
arXiv preprint arXiv:2404.08801
2024/4/12
Prompt-prompted Mixture of Experts for Efficient LLM Generation
arXiv preprint arXiv:2404.01365
2024/4/1
Beidi Chen
H-Index: 7
Yuejie Chi
H-Index: 25
Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding
arXiv preprint arXiv:2403.04797
2024/3/5
LLM Inference Unveiled: Survey and Roofline Model Insights
arXiv preprint arXiv:2402.16363
2024/2/26
Sequoia: Scalable, Robust, and Hardware-aware Speculative Decoding
arXiv preprint arXiv:2402.12374
2024/2/19
Zhihao Jia
H-Index: 16
Beidi Chen
H-Index: 7
Get More with LESS: Synthesizing Recurrence with KV Cache Compression for Efficient LLM Inference
arXiv preprint arXiv:2402.09398
2024/2/14
Laughing hyena distillery: Extracting compact recurrences from convolutions
Advances in Neural Information Processing Systems
2024/2/13
Learn To be Efficient: Build Structured Sparsity in Large Language Models
arXiv preprint arXiv:2402.06126
2024/2/9
KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache
arXiv preprint arXiv:2402.02750
2024/2/5
Hexgen: Generative inference of foundation model over heterogeneous decentralized environment
arXiv preprint arXiv:2311.11514
2023/11/20
Joma: Demystifying multilayer transformers via joint dynamics of mlp and attention
(ICLR) International Conference on Learning Representations
2023/10/1
Efficient streaming language models with attention sinks
arXiv preprint arXiv:2309.17453
2023/9/29
On the Similarity between Attention and SVM on the Token Separation and Selection Behavior
2023/9/22
Towards Structured Sparsity in Transformers for Efficient Inference
2023/7/16
Beidi Chen
H-Index: 7
Yuejie Chi
H-Index: 25
Fast Algorithms for a New Relaxation of Optimal Transport
2023/7/12
CocktailSGD: Fine-tuning foundation models over 500Mbps networks
2023/7/3
Deja vu: Contextual sparsity for efficient llms at inference time
2023/7/3
H O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models
International Conference on Machine Learning
2023/6/24
Inrank: Incremental low-rank learning
arXiv preprint arXiv:2306.11250
2023/6/20