[{'Publication': 'VMoBA: Mixture-of-Block Attention for Video Diffusion Models', 'Conference': 'CVPR 2025'}, {'Publication': 'DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation', 'Conference': 'NeurIPS 2024 Spotlight'}, {'Publication': 'MotionBooth: Motion-Aware Customized Text-to-Video Generation', 'Conference': 'CVPR 2024'}, {'Publication': 'Towards Language-Driven Video Inpainting via Multimodal Large Language Models', 'Journal': 'TPAMI'}, {'Publication': 'Towards Open Vocabulary Learning: A Survey', 'Journal': 'TPAMI'}, {'Publication': 'Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation', 'Conference': 'ICCV 2023'}, {'Publication': 'Towards Robust Referring Image Segmentation', 'Journal': 'TIP'}]
Background
PhD Student at the School of Intelligence Science and Technology, Peking University (PKU), advised by Prof. Yunhai Tong. Research interests focus on leveraging AIGC technologies to create practical application tools that can improve people’s daily lives and drive innovations in academia. Primary research areas include multimodal learning and controllable generation of images, videos, and artistic creations.
Miscellany
Will graduate in Summer 2026. Looking for a job. Please contact through email.