Scholar

Yineng Zhang

Google Scholar ID: SmZ0KcYAAAAJ

LMSYS Org

LLM

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

H-index

i10-index

Publications

Co-authors

list available

Contact

TwitterOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

7 items

TENT: A Declarative Slice Spraying Engine for Performant and Resilient Data Movement in Disaggregated LLM Serving

2026

Cited

SpecForge: A Flexible and Efficient Open-Source Training Framework for Speculative Decoding

2026

Cited

Not All Prefills Are Equal: PPD Disaggregation for Multi-turn LLM Serving

2026

Cited

When RL Meets Adaptive Speculative Training: A Unified Training-Serving System

2026

Cited

FlashInfer-Bench: Building the Virtuous Cycle for AI-driven LLM Systems

arXiv.org · 2026

Cited

Locality-aware Fair Scheduling in LLM Serving

2025

Cited

FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving

2025

Cited

Resume (English only)

Academic Achievements

FlashInfer: Efficient and Customizable Attention Engine for LLM Inference Serving (MLSys 2025 Best Paper Award)

Research Experience

Led the SGLang project, driving roadmap, coordination, and execution across community collaborations that pushed the frontier of open-source inference engines.

Background

A Principal AI Researcher at Together AI and a Core Maintainer of SGLang. Initiated and led the end-to-end DeepSeek V3/R1 effort on SGLang, from day-0 support and performance optimization to large-scale EP deployment and GB200 NVL72 integration. Contributions to AI infrastructure recognized by the U.S. government with O-1A and EB-1A extraordinary ability classifications.

Miscellany

Featured in The New York Times discussing the rise of DeepSeek and its competition with Silicon Valley giants.

Co-authors

3 total

Zihao Ye

NVIDIA, University of Washington

Luis Ceze

Professor of Computer Science and Engineering, University of Washington

Tianqi Chen

Carnegie Mellon University