[ICML 2025] Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity; [ICML 2025] QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache; [ICLR 2025] COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training; [CVPR 2025] NVILA: Efficient Frontier Visual Language Models; [ICML 2025] SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference; [NeurIPS 2023] Training Transformers with 4-bit Integers; [ICML 2024] Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization; [ICML 2025] Oscillation-Reduced MXFP4 Training for Vision Transformers
Research Experience
Research Intern at NVIDIA (Feb 2024 - Aug 2024), proposed a memory-efficient FP8 training method, COAT, published a first-author paper accepted by ICLR 2025, and participated in the NVILA project, responsible for FP8 training of vision language models.
Education
Ph.D. student in Computer Science at the University of California, Berkeley, advised by Prof. Kurt Keutzer; B.E. in Computer Science from Yao Class, Tsinghua University, advised by Prof. Andrew Yao, Prof. Jianfei Chen, and Prof. Jun Zhu; worked closely with Prof. Song Han at MIT and was advised by Prof. Sheng Wang at the University of Washington.
Background
A MLsys researcher interested in efficient training and inference of large language models and diffusion models.
Miscellany
Interests include soccer, badminton, pooling, and photography.