- VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping (2024, arXiv)
- EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM (2025, ICML)
- MoVA: Adapting Mixture of Vision Experts to Multimodal Context (2024, NeurIPS)
- Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models (2024, NeurIPS)
- Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning (2024, NeurIPS, Spotlight Presentation)
Research Experience
- Research Intern at the Base Model Department at SenseTime Research, working closely with Guanglu Song and Yu Liu
- Core member of the founding team for frontline R&D projects, including the large vision foundation model, the multimodal interactive model, and the AIGC product SenseMirage
Education
- Ph.D.: The Chinese University of Hong Kong, MMLab, Advisor: Prof. Hongsheng Li
- Master's Degree: Beihang University, Advisor: Prof. Biao Leng
- Bachelor's Degree: Beihang University, Advisor: Prof. Biao Leng
Background
- Research Interests: Generative AI, particularly in diffusion models and multimodal large language models
- Professional Field: Visual content generation, multimodal understanding
- Brief Introduction: A third-year Ph.D. student from MMLab, The Chinese University of Hong Kong, supervised by Prof. Hongsheng Li. Received both Bachelor's and Master's degrees from Beihang University, supervised by Prof. Biao Leng.