Mingfei Han
Scholar

Mingfei Han

Google Scholar ID: wJEoIXsAAAAJ
MBZUAI; University of Technology Sydney; Bytedance Seed; MMLab, SIAT
Object RecognitionVideo UnderstandingVision Language ModelsRobotics
Citations & Impact
All-time
Citations
1,047
 
H-index
12
 
i10-index
12
 
Publications
20
 
Co-authors
9
list available
Resume (English only)
Academic Achievements
  • Won first place awards in two tracks at IROS 2025; two papers accepted to NeurIPS 2025; RoomTour3D project showcased at CVPR 2025; Shot2Story project presented at ICLR 2025; LongVLM paper presented orally at ECCV 2024; published multiple papers including 'Self-Consistency as a Free Lunch: Reducing Hallucinations in Vision-Language Models via Self-Reflection' and 'RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation'.
Research Experience
  • Currently a postdoctoral researcher at Mohamed Bin Zayed University of Artificial Intelligence. Worked closely with Heng Wang, Linjie Yang, and Xiaojie Jin on various video-language projects at Bytedance Seed.
Education
  • Ph.D. from the University of Technology Sydney, advised by Prof. Xiaojun Chang; Master's degree from the University of Chinese Academy of Sciences (UCAS); Bachelor's degree from Nankai University (NKU) with graduate honors. Spent two years at Monash University and was a visiting student at MMLab, SIAT, Chinese Academy of Sciences, where I worked with Prof. Yu Qiao and Prof. Yali Wang.
Background
  • My research interests lie at the intersection of computer vision and robotics, particularly in large vision–language models, video summarization, and analyzing their hallucination behavior. Recent work spans video–language understanding, with a focus on long video understanding, video grounding tasks such as Referring Video Object Segmentation, and vision–language navigation and manipulation for robots.