- Awards: EgoExoLearn selected as a deserved awardee of the Egocentric Vision (EgoVis) 2023/2024 Distinguished Paper Award
- Project contributions: Eagle2.5 contributing to NVIDIA Cosmos Reasoner1 and NVIDIA Nemotron-H
- Other papers: Vinci accepted by IMWUT 2025; CG-AV-Counting audio-visual counting benchmark; Eagle2 adopted by NVIDIA GEAR Team to develop robotic foundation model GR00T N1; 3 ICLR papers accepted (CG-Bench, EgoHOD, X-Gen); Eagle2 and model weight released at huggingface; CG-Bench integrated into VlmevalKit; Real-time Embodied Smart Assistant Vinci based on Egocentric VLM; Clue-grounded long video understanding benchmark CG-Bench; Top-1 rankings in 7 tracks of 1st EgoVis ECCV2024 Challenge; InternVideo2 accepted by ECCV2024; Suite of modeling video with mamba video-mamba-suite; 4 CVPR papers accepted (InternVL, MVBench, EgoInstructor, EgoExoLearn); Generalist visual-language model InternVL; Best performance in Temporal Sound Localisation & runner-up in Temporal Action Localisation in the first Perception Test challenge; MAT accepted by ICCV; Novel Video Sequence Understanding Framework VideoLLM; BasicTAD accepted by CVIU; Champion of WSDM Cup 2023 Toloka VQA Challenge; Final Ego4D report and code provided; Top-1 rankings in 7 tracks of Ego4D ECCV2022 Challenge
Research Experience
- Working at NVIDIA on frontier Vision-Language Models, collaborating with Zhiding Yu, Guilin Liu, and other outstanding researchers on Project Eagle
- Previously worked on multimodal large model Intern series (including InternVideo, InternVideo2, InternVL, InternVid)
- Led the adaptation of the base model to the egocentric video understanding downstream and won 14 championships in two Ego4D/EgoVis contests
Education
- Ph.D. in Computer Science, Nanjing University
- Advisors: Prof. Limin Wang, Prof. Tong Lu
Background
- Research Interests: General visual perception and human-computer and multimodal interaction system
- Focus: General video understanding, egocentric vision perception, and multimodal computing