Publications: VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation (ICLR 2025); EfficientViT-SAM: Accelerated Segment Anything Model Without Accuracy Loss (CVPR 2024 ELVM Workshop); One-2-3-45++: Fast Single Image to 3D Objects with Consistent Multi-View Generation and 3D Diffusion (CVPR 2024); Complete-to-partial 4D distillation for self-supervised point cloud sequence representation learning (CVPR 2023); CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models (CVPR 2025); NVILA: Efficient Frontier Visual Language Models (CVPR 2025); HART: Efficient Visual Generation with Hybrid Autoregressive Transformer (ICLR 2025); Sparse Refinement for Efficient High-resolution Semantic Segmentation (ECCV 2024); Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model (Technical Report).
Research Experience
During undergraduate studies, worked with Prof. Li Yi and Prof. Hao Su on 3D computer vision.
Education
Ph.D. Student at MIT EECS, advised by Prof. Song Han; Bachelor's Degree in Computer Science from Yao Class, Tsinghua University.
Background
Research Interests: Vision-centric efficient machine learning, especially for foundation models. Bachelor's degree from Yao Class, Tsinghua University.
Miscellany
Academic Service: Conference reviewer for ICLR, ICML, NeurIPS, CVPR, ICCV, ECCV, etc.