Selected Publications: Aligning Vision to Language: Annotation-Free Multimodal Knowledge Graph Construction for Enhanced LLMs Reasoning (ICCV 2025); HM-RAG: Hierarchical Multi-Agent Multimodal Retrieval Augmented Generation (ACM MM 2025). For a complete list of publications, please refer to his publications page.
Research Experience
Serving as a reviewer for ICME, SMC, and AAAI, contributing to the peer-review process in the fields of computer vision, multimodal AI, and machine learning.
Master student in Computer Science at Tongji University, researching on multimodal AI, including multimodal retrieval and generation, reasoning with multimodal large language models, and multi-agent interaction in distributed environments. Aiming to enhance the capability of AI systems to understand and reason across multiple modalities while addressing challenges such as privacy preservation, trustworthy reasoning, and model efficiency.
Miscellany
Programming & Frameworks: Python, PyTorch, Distributed Training, Linux, C/C++, Java; Languages: Chinese (native), English (IELTS 7.0), Japanese, German; Specialized Knowledge: Multimodal foundation models, medical image analysis, knowledge graph construction, PEFT, diffusion models.