Publications: ICCV 2025 'WikiAutoGen: Towards Multi-Modal Wikipedia-Style Article Generation'; TMLR 2025 'Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models'; EMNLP 2025 'MERMAID: Multi-perspective Self-reflective Agents with Generative Augmentation for Emotion Recognition'; AAAI 2026 'InEx: Hallucination Mitigation via Introspection and Cross-Modal Multi-Agent Collaboration'; SIGGRAPH Asia 2025 'ReChar: Revitalising Characters with Structure Preserved and User-Specified Aesthetic Enhancements'; Tech Report 'Tropical Representations of Chinese Monoids with and without Involution'
Research Experience
Remote Research Intern at KAUST Vision-CAIR (Dec 2024 – Present), supervised by Mohamed Elhoseiny; Research Intern, General Perceptual Computing Group, SenseTime (Feb 2025 – Present); Remote Research Intern, BCML Lab, Heriot-Watt University (Mar 2024 – Present); Research Assistant at LIAS Lab, CUHK (Shenzhen) (Apr 2024 – Nov 2024); Data Analysis Assistant, iFLYTEK (Jun 2023 – Aug 2023)
Education
Bachelor of Science in Mathematics with a minor in Management from Lanzhou University, China
Background
Research Interests: Generative Models (Image Generation, Video Generation, Sequence Generation), Vision-Language (Multi-modal Comprehension and Generation), Efficient Modeling (Multi-modal token compression for efficient modeling); Long-term goal is to build general-purpose multimodal systems that can perceive, reason, and communicate effectively across visual, textual, and behavioral modalities in dynamic, real-world environments.
Miscellany
Feel free to reach out for collaborations, questions, or just to chat!