First-author paper 'FlexLLM: Token-Level Co-Serving of LLM Inference and Finetuning with SLO Guarantees', NSDI 2026
First-author paper 'SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference', NeurIPS 2025 (Spotlight Award)
Co-authored 'SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification', ASPLOS 2024 (350+ citations)
Co-authored 'Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models', ACL 2024 Oral (Outstanding Paper Award)
Contributed to multiple high-impact publications on LLM inference acceleration, speculative decoding, and efficient serving systems (e.g., EuroSys 2026, NeurIPS 2025, ArXiv 2025, ASPLOS 2024, SIGCOMM 2023)