TokenLake: A Unified Segment-level Prefix Cache Pool for Fine-grained Elastic Long-Context LLM Serving (Preprint)
Fast Distributed Inference Serving for Large Language Models (NSDI 2026, to appear)
Optimizing RLHF Training for Large Language Models with Stage Fusion (NSDI 2025)
StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation (Preprint)
LoongServe: Efficiently Serving Long-Context Large Language Models with Elastic Sequence Parallelism (SOSP 2024)
dLoRA: Dynamically Orchestrating Requests and Adapters for LoRA LLM Serving (OSDI 2024)
A Survey of Resource-efficient LLM and Multimodal Foundation Models (Preprint)
XRON: A Hybrid Elastic Cloud Overlay Network for Video Conferencing at Planetary Scale (SIGCOMM 2023)
Transparent GPU Sharing in Container Clouds for Deep Learning Workloads (NSDI 2023)
AMOS: Enabling Automatic Mapping for Tensor Computations On Spatial Accelerators with Hardware Abstraction (ISCA 2022)
NeoFlow: A Flexible Framework for Enabling Efficient Compilation for High Performance DNN Training (TPDS 2021)
Research Experience
StepFun, Research Intern, Oct. 2024 - Sept. 2025
Meituan, Research Intern, Jun. 2024 - Oct. 2024
Shanghai AI Lab, NDS Group, Research Intern, Oct. 2023 - Jun. 2024
Alibaba, DAMO Academy, Research Intern, Nov. 2021 - Sept. 2023
Tencent, TEG, Research Intern, Apr. 2021 - Jun. 2021
Background
Research interests include machine learning systems, distributed systems, and cloud computing. Bachelor of Computer Science and Technology (Summa Cum Laude) from Turing Class, Peking University, currently a Ph.D. candidate.
Miscellany
Served as a Shadow Program Committee member for the European Conference on Computer Systems (EuroSys) in 2026, a reviewer for IEEE Transactions on Networking (IEEE ToN) in 2025, and a Shadow Program Committee member for EuroSys in 2025. Also worked as a Teaching Assistant for Introduction to Computation (A) in Fall 2022.