Extensive publications in top-tier venues including NeurIPS, ICCV, CVPR, ICLR, ECCV, T-PAMI, and IJCV
2025: 6 papers accepted at NeurIPS, 2 at T-PAMI, 5 at ICCV, 2 at ICML, 6 at CVPR, 6 at ICLR
2024: 6 papers accepted at ECCV (including InternVideo2), 10 at CVPR, 2 at ICLR (InternVid & SparseFormer), multiple at T-PAMI (extensions of PDPP, STMixer, MixFormer), and VLG work at IJCV
2023: Best Paper Honorable Mention Award at ACM MM 2023 (RefineTAD), 10 papers at ICCV (including UMT Foundation Model and SportsMOT dataset), MixFormer V2 at NeurIPS, LogN and crowded pose estimation paper at IJCV, CamLiFlow, Dynamic MDETR, and survey on 3D human mesh recovery in T-PAMI
2022: VideoMAE and PointTAD accepted at NeurIPS, one paper at ECCV, CDG and LIP extension at IJCV, FineAction dataset at TIP
Developed influential models and datasets: MixFormer series (including V2), VideoMAE V2 (first billion-scale video transformer), VideoChat (video foundation model + LLM), Grasp Anything (embodied AI), InterVid dataset, SportsMOT dataset, etc.
Invited as Associate Editor of T-PAMI in 2025
Appointed Editorial Board Member of IJCV in 2023
Delivered an ARP talk at VALSE 2023
Organized the 2nd DeeperAction Challenge at ECCV 2022