- AUTOHALLUSION: Automatic Generation of Hallucination Benchmarks for Vision-Language Models. EMNLP 2024
- ARDuP: Active Region Video Diffusion for Universal Policies. IROS 2024 (Oral Presentation)
- What is Point Supervision Worth in Video Instance Segmentation? (Incomplete information)
Research Experience
- Research Engineer at META FAIR Robotics, World Model Team, July 2025 - present. Worked on vision-language-latent-action models for general robot manipulation.
- Research Intern at NVIDIA, May - Aug 2023. Worked on video diffusion for robot manipulation. Mentors: De-An Huang, Linxi 'Jim' Fan, Yuke Zhu.
- Research Intern at NVIDIA, May - Nov 2022. Worked on video instance segmentation with point supervision. Mentors: Zhiding Yu, De-An Huang, Shiyi Lan.
Education
- Ph.D. in Computer Science from the University of Maryland, College Park, advised by Prof. Abhinav Shrivastava.
- M.S. in Computer Science from ShanghaiTech University, advised by Prof. Xuming He.
- B.Eng. in Software Engineering from Tongji University.
Background
- Research Interests: Intersection of Computer Vision and Autonomous Agents, focusing on video understanding, object recognition, policy learning, and generative world modeling.
- Goal: To build foundation models that seamlessly integrate visual, linguistic, and action understanding for real-world applications.
Miscellany
Looking for full-time jobs in industry. Please feel free to contact.