FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving (MLSys 2025 Best Paper Award)
Research Experience
Led the SGLang project, driving roadmap, coordination, and execution across community collaborations that pushed the frontier of open-source inference engines.
Background
A Principal AI Researcher at Together AI and a Core Maintainer of SGLang. Initiated and led the end-to-end DeepSeek V3/R1 effort on SGLang, from day-0 support and performance optimization to large-scale EP deployment and GB200 NVL72 integration. Contributions to AI infrastructure recognized by the U.S. government with O-1A and EB-1A extraordinary ability classifications.
Miscellany
Featured in The New York Times discussing the rise of DeepSeek and its competition with Silicon Valley giants.