Scholar

Zili Zhang

Google Scholar ID: 310QUvQAAAAJ

Peking University

Distributed systemDeep learning

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

742

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailzhangzili1201@gmail.com CVOpen ↗GitHubOpen ↗

Publications

10 items

BigMac: Breaking the Pareto Frontier of Compute and Memory in Multimodal LLM Training

2026

Cited

ReLibra: Routing-Replay-Guided Load Balancing for MoE Training in Reinforcement Learning

2026

Cited

Heddle: A Distributed Orchestration System for Agentic RL Rollout

2026

Cited

TokenLake: A Unified Segment-level Prefix Cache Pool for Fine-grained Elastic Long-Context LLM Serving

2025

Cited

Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model

2025

Cited

Label-efficient Single Photon Images Classification via Active Learning

2025

Cited

StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation

2025

Cited

Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction

2025

Cited

Resume (English only)

Academic Achievements

Published multiple papers as first or co-author in top-tier venues including NSDI, OSDI, SIGCOMM, and TOCS, such as:
“TokenLake: A Unified Segment-level Prefix Cache Pool for Fine-grained Elastic Long-Context LLM Serving” (Preprint)
“StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation” (Preprint)
“RAGCache: Efficient Knowledge Caching for Retrieval-Augmented Generation” (TOCS'25, To appear)
“Fast Distributed Inference Serving for Large Language Models” (NSDI'26, To appear, Equal contribution)
“DistTrain: Addressing Model and Data Heterogeneity with Disaggregated Training for Multimodal Large Language Models” (SIGCOMM 2025)
“RLHFuse: Efficient RLHF Training for Large Language Models with Inter- and Intra-Stage Fusion” (NSDI 2025)
“dLoRA: Dynamically Orchestrating Requests and Adapters for LoRA LLM Serving” (OSDI 2024)
“Jolteon: Unleashing the Promise of Serverless for Serverless Workflows” (NSDI 2024)
“Fast Vector Query Processing for Large Datasets Beyond GPU Memory with Reordered Pipelining” (NSDI 2024)
“Ditto: Efficient Serverless Analytics with Elastic Parallelism” (SIGCOMM 2023)
“Fast, Approximate Vector Queries on Very Large Unstructured Datasets” (NSDI 2023)
“Transparent GPU Sharing in Container Clouds for Deep Learning Workloads” (NSDI)

Co-authors

15 total