Published several papers including 'MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources', 'VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding' and more; received awards like NeurIPS 2025 Oral, CVPR 2025 Highlight, CVPR 2024 Highlight, ACL 2023 Area Chair Award, and Best Paper Nomination.
Research Experience
Involved in multiple research projects such as MMR1, VideoLLaMA 3, Inf-CLIP, and has presented papers at various international conferences.
Education
Ph.D. student at Nanyang Technological University, supervised by Prof. Lu Shijian (NTU) and Dr. Bing Lidong (Alibaba-DAMO Academy), specializing in Deep Learning and Multi-modality AI.
Background
Research Interests: Multimodality and Embodied AI, specifically Language+Vision and Language+Vision+Action. Specialization: Deep Learning with a focus on Multi-modality and Embodied AI research.