Haocheng Xi
Scholar

Haocheng Xi

Google Scholar ID: klZ2MMcAAAAJ
University of California, Berkeley
Efficient ML
Citations & Impact
All-time
Citations
416
 
H-index
7
 
i10-index
7
 
Publications
15
 
Co-authors
0
 
Resume (English only)
Academic Achievements
  • [ICML 2025] Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity; [ICML 2025] QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache; [ICLR 2025] COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training; [CVPR 2025] NVILA: Efficient Frontier Visual Language Models; [ICML 2025] SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference; [NeurIPS 2023] Training Transformers with 4-bit Integers; [ICML 2024] Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization; [ICML 2025] Oscillation-Reduced MXFP4 Training for Vision Transformers
Research Experience
  • Research Intern at NVIDIA (Feb 2024 - Aug 2024), proposed a memory-efficient FP8 training method, COAT, published a first-author paper accepted by ICLR 2025, and participated in the NVILA project, responsible for FP8 training of vision language models.
Education
  • Ph.D. student in Computer Science at the University of California, Berkeley, advised by Prof. Kurt Keutzer; B.E. in Computer Science from Yao Class, Tsinghua University, advised by Prof. Andrew Yao, Prof. Jianfei Chen, and Prof. Jun Zhu; worked closely with Prof. Song Han at MIT and was advised by Prof. Sheng Wang at the University of Washington.
Background
  • A MLsys researcher interested in efficient training and inference of large language models and diffusion models.
Miscellany
  • Interests include soccer, badminton, pooling, and photography.
Co-authors
0 total
Co-authors: 0 (list not available)