Co-developed FLUX-Reason-6M, a million-scale text-to-image reasoning dataset, and PRISM-Bench benchmark (2025)
Proposed CodePlot-CoT for mathematical visual reasoning using code-driven images (2025)
Served as first or co-first author on multiple influential publications, advancing visual generation and multimodal reasoning
Background
Ph.D. candidate at the Multimedia Laboratory (MMLab), The Chinese University of Hong Kong (CUHK), expected to graduate in 2025
Research driven by a passion for Artificial General Intelligence (AGI), with a focus on visual understanding and generation
Dedicated to building integrated systems capable of perceiving, understanding, and generating visual content using advanced multimodal large language models
Supervised by Prof. Hongsheng Li and closely collaborating with Prof. Xihui Liu during Ph.D.
Former visiting scholar at MIT CSAIL, advised by Prof. Dina Katabi