GroundingBooth: Introduced a framework for text-to-image customization that achieves zero-shot instance-level spatial grounding on both foreground subjects and background objects. Mixed-View Panorama Synthesis using Geospatially Guided Diffusion: Proposed a method for mixed-view panorama synthesis using geospatially guided diffusion.
Research Experience
Internship at Bosch Research, developed a world-model-based framework that unifies trajectory planning and autoregressive future image generation, enhanced with Chain-of-Thought reasoning within a single vision-language model (VLM). Also involved in an ongoing project proposing a framework that leverages vision-language models' physics understanding to enable video generation with physically consistent motion and accurate 3D dynamics.
Education
Ph.D. candidate in Computer Science at Washington University in St. Louis, advised by Prof. Nathan Jacobs. Bachelor's degree in Electrical and Information Engineering from Tianjin University. Worked at the Institute of Automation, Chinese Academy of Sciences (CASIA), collaborating with Prof. Jinqiao Wang and Dr. Xu Zhao.
Background
Research interests: Computer vision and multi-modal learning, especially generative models and AIGC-related topics. Specifically: (1) Unifying vision understanding and generation, including world models for applications such as autonomous driving; (2) Controllable & personalized image/video generation and editing; (3) Integration of vision-language models with generative modeling; (4) Generative AI for 3D vision, including neural rendering, cross-view synthesis, and novel view synthesis. Also interested in geometric computer vision and its combination with generative models.
Miscellany
Actively looking for 2026 spring/summer research internship opportunities.