Won first place awards in two tracks at IROS 2025; two papers accepted to NeurIPS 2025; RoomTour3D project showcased at CVPR 2025; Shot2Story project presented at ICLR 2025; LongVLM paper presented orally at ECCV 2024; published multiple papers including 'Self-Consistency as a Free Lunch: Reducing Hallucinations in Vision-Language Models via Self-Reflection' and 'RoomTour3D: Geometry-Aware Video-Instruction Tuning for Embodied Navigation'.
Research Experience
Currently a postdoctoral researcher at Mohamed Bin Zayed University of Artificial Intelligence. Worked closely with Heng Wang, Linjie Yang, and Xiaojie Jin on various video-language projects at Bytedance Seed.
Education
Ph.D. from the University of Technology Sydney, advised by Prof. Xiaojun Chang; Master's degree from the University of Chinese Academy of Sciences (UCAS); Bachelor's degree from Nankai University (NKU) with graduate honors. Spent two years at Monash University and was a visiting student at MMLab, SIAT, Chinese Academy of Sciences, where I worked with Prof. Yu Qiao and Prof. Yali Wang.
Background
My research interests lie at the intersection of computer vision and robotics, particularly in large vision–language models, video summarization, and analyzing their hallucination behavior. Recent work spans video–language understanding, with a focus on long video understanding, video grounding tasks such as Referring Video Object Segmentation, and vision–language navigation and manipulation for robots.