Browse publications on Google Scholar (top-right) ↗
Resume (English only)
Academic Achievements
Published two papers, 'Molmo' and 'Synthetic Visual Genome', at CVPR 2025, one of which is a Best Paper Award Candidate; Published one paper at Neurips and one at Neurips D&B in 2024; Released Molmo, an open state-of-the-art multimodal AI model; Other notable works include BLIP3-KALE, Certainly Uncertain benchmark, etc.
Research Experience
Internship at Microsoft Research Deep Learning Team (Spring 2022 - Winter 2024); Co-organized ECCV2024 Workshop on Multimodal Agents and CVPR 2024 Tutorial on Generalist Agent AI.
Education
PhD in Computer Science and Engineering from University of Washington, advised by Yejin Choi, Ali Farhadi, and Ranjay Krishna; B.S. in EECS from University of California, Berkeley, worked closely with Anna Rohrbach and Trevor Darrell.
Background
Research Interests: visual perception and language understanding, specifically how machines can reason about the visual world as humans do. Research projects focused on: Empowering Visual Commonsense Reasoning of AI models, Grounding Objects, Concepts, Actions to Images and Videos, Evaluation of Multimodal Language Models.
Miscellany
Contact information includes Email, Google Scholar, and GitHub.