Published multiple papers in top conferences such as ICML, ICLR, HPCA, including but not limited to:
- SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer
- Sparse Video-Gen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity
- QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
- LServe: Efficient Long-Sequence LLM Serving with Unified Sparse Attention
- SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
- SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers
- LEGO: Spatial Accelerator Generation and Optimization for Tensor Applications
- VideoTime³: A 40-uJ/frame 38 FPS Video Understanding Accelerator With Real-Time DiffFrame Temporal Redundancy Reduction and Temporal Modeling
- TorchSparse: Efficient Point Cloud Inference Engine
- QuantumNAS: Noise-Adaptive Search for Robust Quantum Circuits
- PointAcc: Efficient Point Cloud Accelerator
- NAAS: Neural Accelerator Architecture Search
Selected as DAC Young Fellow 2021 and got into the 2021 Qualcomm Innovation Fellowship winners list.
Research Experience
Currently working as a research scientist at NVIDIA.
Education
PhD from MIT EECS HAN Lab, advised by Prof. Song Han; B.Eng. in Electronic Engineering from Tsinghua University.
Background
Currently a research scientist at NVIDIA. Research area is efficient deep learning, with a special focus on the co-design of algorithm, system, and hardware for foundation models (diffusion models, LLMs, etc).