Scholar
Ziyu Guo
Google Scholar ID: S9GLetwAAAAJ
The Chinese University of Hong Kong
Multi-modality Learning
LLM/VLMs
3D Vision
Follow
Homepage
↗
Google Scholar
↗
Citations & Impact
All-time
Citations
3,872
H-index
25
i10-index
30
Publications
20
Co-authors
2
list available
Contact
Email
guoziyu86@gmail.com
GitHub
Open ↗
LinkedIn
Open ↗
Publications
27 items
MME-CoF-Pro: Evaluating Reasoning Coherence in Video Generative Models with Text and Visual Hints
2026
Cited
0
GENIUS: Generative Fluid Intelligence Evaluation Suite
2026
Cited
0
EditThinker: Unlocking Iterative Reasoning for Any Image Editor
2025
Cited
0
DraCo: Draft as CoT for Text-to-Image Preview and Rare Concept Generation
2025
Cited
0
Architecture Decoupling Is Not All You Need For Unified Multimodal Model
2025
Cited
0
Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation
2025
Cited
0
Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark
2025
Cited
0
BEAR: Benchmarking and Enhancing Multimodal Language Models for Atomic Embodied Capabilities
2025
Cited
0
Load more
Resume (English only)
Academic Achievements
- CoT/CoF Reasoning for Visual Generation (arXiv)
- Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark (Technical Report)
- Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step (Under Review)
- T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT (NeurIPS 2025)
- Delving into RL for Image Generation with CoT: A Study on DPO vs. GRPO (NeurIPS 2025)
- SciVerse: Unveiling the Knowledge Comprehension and Visual Reasoning of LMMs on Multi-modal Scientific Problems (ACL 2025)
- MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency (ICML 2025)
- MAVIS: Mathematical Visual Instruction Tuning with an Automatic Data Engine (ICLR 2025)
- MathVerse: Does your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? (ECCV 2024)
- MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines (ICLR 2025)
- Point-Bind & Point-LLM: Aligning Point Cloud with Multi-modality for 3D Understanding, Generation, and Instruction Following (Under Review)
- Exploring the Potential of Encoder-free Architectures in 3D LMMs (Under Review)
- SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners (Technical Report)
- PointCLIP: Point Cloud Understanding (CVPR 2022)
Research Experience
- Research Intern at Meta
- Research Intern at Amazon AWS AI Lab
- Research Intern at Roblox
- Research Intern at Tencent
- Research Intern at Shanghai AI Laboratory
Education
- Ph.D. Candidate, The Chinese University of Hong Kong, Department of Computer Science and Engineering, Supervisor: Prof. Pheng-Ann Heng
- Bachelor’s Degree, Peking University, Computer Science, Supervisor: Prof. Bin Cui
Background
- Research Interests: Multi-modal Learning, Large Language/Vision Models, and 3D Vision
- Professional Field: Computer Science and Engineering
Co-authors
2 total
Pheng Ann Heng
Choh-Ming Li Professor of Computer Science and Engineering, The Chinese University of Hong Kong
Bin CUI
Professor of Computer Science, Peking University
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up