Paper 'Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis' accepted to CVPR 2025; 'VividMed: Vision Language Model with Versatile Visual Grounding for Medicine' accepted to NAACL 2025; 'CORE4D: A 4D Human-Object-Human Interaction Dataset for Collaborative Object REarrangement' accepted to CVPR 2025.
Research Experience
Research intern at Stanford, advised by Serena Yeung-Levy; also in close collaboration with Saining Xie and has worked with Ting Chen.
Education
Undergraduate student in Computer Science and Technology at Tsinghua University
Background
An undergraduate student majoring in computer science and technology at Tsinghua University, focusing on multimodal learning and generative modeling, with applications in both the arts and sciences.