Scholar

Zhuofan Zong

Google Scholar ID: vls0YhoAAAAJ

MMLab, The Chinese University of Hong Kong

Large ModelsMultimodalObject Detection3D Object Detection

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

1,499

H-index

i10-index

Publications

Co-authors

Contact

Emailzongzhuofan@gmail.com GitHubOpen ↗

Publications

8 items

LMGenDrive: Bridging Multimodal Understanding and Generative World Modeling for End-to-End Driving

2026

Cited

FullStack-Agent: Enhancing Agentic Full-Stack Web Coding via Development-Oriented Testing and Repository Back-Translation

2026

Cited

SlidesGen-Bench: Evaluating Slides Generation via Computational and Quantitative Metrics

2026

Cited

DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving

arXiv.org · 2026

Cited

DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation

2025

Cited

WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning

2025

Cited

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT

2025

Cited

ADT: Tuning Diffusion Models with Adversarial Supervision

2025

Cited

Resume (English only)

Academic Achievements

- Publications:
- VividFace: A Diffusion-Based Hybrid Framework for High-Fidelity Video Face Swapping (2024, arXiv)
- EasyRef: Omni-Generalized Group Image Reference for Diffusion Models via Multimodal LLM (2025, ICML)
- MoVA: Adapting Mixture of Vision Experts to Multimodal Context (2024, NeurIPS)
- Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models (2024, NeurIPS)
- Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought Reasoning (2024, NeurIPS, Spotlight Presentation)

Research Experience

- Research Intern at the Base Model Department at SenseTime Research, working closely with Guanglu Song and Yu Liu
- Core member of the founding team for frontline R&D projects, including the large vision foundation model, the multimodal interactive model, and the AIGC product SenseMirage

Education

- Ph.D.: The Chinese University of Hong Kong, MMLab, Advisor: Prof. Hongsheng Li
- Master's Degree: Beihang University, Advisor: Prof. Biao Leng
- Bachelor's Degree: Beihang University, Advisor: Prof. Biao Leng

Background

- Research Interests: Generative AI, particularly in diffusion models and multimodal large language models
- Professional Field: Visual content generation, multimodal understanding
- Brief Introduction: A third-year Ph.D. student from MMLab, The Chinese University of Hong Kong, supervised by Prof. Hongsheng Li. Received both Bachelor's and Master's degrees from Beihang University, supervised by Prof. Biao Leng.

Miscellany

- Personal Interests: Not provided in detail

Co-authors

0 total

Co-authors: 0 (list not available)