[2025/05] Introduced Seaweed-7B, a cost-efficient foundation model for video generation
[2025/03] Released Long Context Tuning (LCT), enabling scene-level video storytelling up to 5 minutes
[2025/04] Delivered a lecture at DeepLearn
[2025/02] Served as Area Chair for ICML 2025, ICCV 2025, and NeurIPS 2025
[2025/01] Introduced Seaweed-APT, a one-step video generation method for high-quality video synthesis
[2024/10] Served as Area Chair for ICLR 2025 and CVPR 2025, Action Editor for TMLR
[2024/08] Honored to receive the IJCAI-JAIR Best Paper Award
[2024/07] Grateful to receive the ICML Best Paper Award
[2024/07] Gave a keynote at ICME 2024
[2024/08] Appointed Associate Editor for TPAMI
[2024/01] Served as Area Chair for CVPR 2024 and ICML 2024
[2024/01] MAGVIT-v2, a leading video tokenizer powering VideoPoet and WALT, was accepted to ICLR 2024
[2023/12] Announced VideoPoet, my primary 2023 focus, from initial design through v0 to current milestones
[2023/11] Released W.A.L.T, a diffusion-based transformer model for photorealistic video generation in a unified latent space
[2023/11] Released StyleDrop, enabling few-shot personalized text-to-image synthesis
[2023/03] MAGVIT for multi-task video generation was accepted to CVPR 2023 as Highlight
[2023/03] Served as Area Chair for ICCV 2023 and NeurIPS 2023
[2023/01] Introduced MUSE, a masked vision transformer for text-to-image generation
[2022/12] Served as Area Chair for CVPR 2023
[2022/06] Pyramid Adversarial Training (CVPR'22) selected as a Best Paper Finalist
[2022/06] Released code for ViTGAN (ICLR'22)
[2022/06] Controlled Noisy Web Labels dataset (ICML'20) is now available via TFDS
[2021/10] Joined Carnegie Mellon University as Adjunct Faculty
[2021/09] Received Best Reviewer awards at ICML 2020–2021 and Outstanding Reviewer at NeurIPS 2021
[2021/03] Released LeCAM-GAN (CVPR'21), top-ranked on CIFAR-100 and ImageNet (25%)
[2021/05] Gave invited talks on robust deep learning at ICLR 2021 WeaSuL Workshop and CMU LTI
[2020/10] Congrats to Yu Wu on receiving the Google Fellowship 2020
[2020/07] Published our work on robust learning from noisy labels at ICML 2020
[2020/06] Released The Garden of Forking Paths dataset for evaluating multiple plausible futures
[2020/05] Co-organized two CVPR 2020 workshops: AI for Content Creation and Language and Vision
[2019/09] Congrats to intern Junwei Liang on receiving Baidu Scholarship 2019
[2019/09] Served as panelist for NSF America's Seed Fund (SBIR) on AI
[2019/07] Best Paper Candidate at ACL 2019 (top 1%)
[2019/05] Released TIRG (CVPR'19) for vision-language image retrieval
[2019/05] Released activity prediction model (CVPR'19), with demo
[2019/05] Released Eidetic-3D LSTM (ICLR'19)
[2019/03] Gave guest lectures (LTI-11-775) on vision + language at CMU
[2019/01] Released Graph Distillation (ECCV'18) on GitHub
Background
Focused on visual generation and multimodal foundation models. Previously worked at ByteDance/TikTok, Google, and as an Adjunct Faculty member at Carnegie Mellon University. Research addresses real-world challenges in robust deep learning, generative AI, and large-scale multimodal data.