StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant (NeurIPS 2025)
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models (EMNLP 2025 Findings)
Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering (ACM MM 2024)
Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge (ECCV 2024)
Adapting Multimodal Large Language Models for Video Question Answering by Capturing Question-critical and Coherent Moments (IEEE TMM 2025)
Pixel level Semantic Correspondence through Layout aware Representation Learning and Multi scale Matching Integration (CVPR 2024)
Object-Centric Cross-Modal Knowledge Reasoning for Future Event Prediction in Videos (IEEE TCSVT 2024)
IVRSandplay: An Immersive Virtual Reality Sandplay System Coupled with Hand Motion Capture and Eye Tracking (CSCWD 2023)
Background
Research Interests: Multimodal Large Language Models and their broad applications (Vision-Language Reasoning, Video Understanding, Embodied-AI, Unified Image/Video Generation, etc.)