Scholar

Zhongyu Yang

Google Scholar ID: x2VGVvcAAAAJ

Lanzhou University

Multi-modal LearningMulti-Agent

Citations & Impact

All-time

Citations

H-index

i10-index

Publications

Co-authors

list available

Contact

Publications

1 items

2026

Cited

Resume (English only)

Academic Achievements

Publications: ICCV 2025 'WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation'; TMLR 2025 'Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models'; EMNLP 2025 'MERMAID: Multi-perspective Self-reflective Agents with Generative Augmentation for Emotion Recognition'; AAAI 2026 'InEx: Hallucination Mitigation via Introspection and Cross-Modal Multi-Agent Collaboration'; SIGGRAPH Asia 2025 'ReChar: Revitalising Characters with Structure Preserved and User-Specified Aesthetic Enhancements'; Tech Report 'Tropical Representations of Chinese Monoids with and without Involution'

Research Experience

Remote Research Intern at KAUST Vision-CAIR (Dec 2024 – Present), supervised by Mohamed Elhoseiny; Research Intern, General Perceptual Computing Group, SenseTime (Feb 2025 – Present); Remote Research Intern, BCML Lab, Heriot-Watt University (Mar 2024 – Present); Research Assistant at LIAS Lab, CUHK (Shenzhen) (Apr 2024 – Nov 2024); Data Analysis Assistant, iFLYTEK (Jun 2023 – Aug 2023)

Education

Bachelor of Science in Mathematics with a minor in Management from Lanzhou University, China

Background

Research Interests: Generative Models (Image Generation, Video Generation, Sequence Generation), Vision-Language (Multi-modal Comprehension and Generation), Efficient Modeling (Multi-modal token compression for efficient modeling); Long-term goal is to build general-purpose multimodal systems that can perceive, reason, and communicate effectively across visual, textual, and behavioral modalities in dynamic, real-world environments.

Miscellany