Paper accepted by ACM MM 2025; won the champion of Micro-gesture Classification sub-challenge in MiGA@IJCAI 2025; involved in projects such as MA-Bench: Towards Fine-grained Micro-Action Understanding.
Research Experience
Current research focuses on text-guided video anomaly detection (VAD) based on Large Vision-Language Models (LVLMs), aiming to enable fine-grained, interpretable, and human-centered video understanding; conducted research on visual perception systems for assisting visually impaired individuals during undergraduate studies, exploring multimodal sensing, intelligent interaction, and navigation technologies.
Education
Currently pursuing an MSc in Computer Graphics, Vision and Imaging at University College London, supervised by Assoc. Kaan Akşit; received a BEng in Computer Science and Technology from Hefei University of Technology, supervised by Prof. Dan Guo.
Background
Research interests span computer vision, multimodal learning, and vision-language understanding, with an emphasis on perception, reasoning, and generation in visual intelligence. Particularly interested in building systems that connect human motion, emotion, and cognition through multimodal signals, with additional interests in autonomous driving and environmental perception.