- FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion (Preprint, 2024)
- Characterizing Structural Regularities of Labeled Data in Overparameterized Models (ICML, 2021)
- Learning to optimize tensor programs (NIPS, 2018)
- TVM: An Automated End-to-End Optimizing Compiler for Deep Learning (OSDI, 2018)
- Efficient Deep Learning Inference on Edge Devices (MLSys, 2018)
Research Experience
Heavily involved in projects such as MegaScale, Apache TVM, and Apache MXNet.
Education
Ph.D. from the Paul G. Allen School of Computer Science & Engineering at the University of Washington, advised by Luis Ceze and Tianqi Chen; Bachelor’s degree from Fudan University, where he was a member of Fudan NLP Lab, working with Xipeng Qiu and Zheng Zhang.
Background
Ziheng works on large language model (LLM) systems at ByteDance, focusing on scaling and optimizing LLM training and inference. His research interests include Machine Learning Systems and Large Language Models.