- Grand Challenge Summary Paper at ACM MM 2025 on Multimodal Conversational Aspect-based Sentiment Analysis
- Workshop Summary Paper at ACM MM 2025 on Cognition-oriented Multimodal Affective and Empathetic Computing
- Preprint on ArXiv on diagnosing video hallucination with a hierarchical framework
- Accepted at TIFS on Poisoning Attacks to Knowledge Distillation-based Federated Learning under Robust Aggregation Rules
- Accepted at ACL 2025 FEVER Workshop on EMULATE: A Multi-Agent Framework for Determining the Veracity of Atomic Claims by Emulating Human Actions
- Accepted at ACL 2025 (Oral) on Aristotle: Mastering Logical Reasoning with A Logic-Complete Decompose-Search-Resolve Framework
- Accepted at ICML 2025 (Oral, Spotlight) on On Path to Multimodal Generalist: Levels and Benchmarks
- Accepted at ICML 2025 on VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models
- Accepted at ICML 2025 on SWIFTCODE: Enhancing Code Generation in Large Language Models through Efficiency-Aware Fine-tuning
- Accepted at ICLR 2025 on PAD: Personalized Alignment at Decoding-Time
- Accepted at WWW 2025 on Towards Multimodal Empathetic Response Generation: A Rich Text-Speech-Vision Avatar-based Benchmark
- New Paper Published on arxiv on A Survey on Benchmarks of Multimodal Large Language Models
- Accepted at ACM MM Workshop (MIS24) (Best Paper Award) on Fine-grained Structural Hallucination Detection for Unified Visual Comprehension and Generation in Multimodal LLM
- Accepted at ACM MM 2024 (Oral) on PanoSent: A Panoptic Sextuple Extraction Benchmark for Multimodal Conversational Aspect-based Sentiment Analysis
- 2nd Place at SemEval-2024 on NUS-Emo at SemEval-2024 Task 3: Instruction-Tuning LLM for Multimodal Emotion-Cause Analysis in Conversations
- Accepted at TDSC on Towards Class-Balanced Privacy Preserving Heterogeneous Model Aggregation
- Preprint on ArXiv on Dr.V: A Hierarchical Perception-Temporal-Cognition Framework to Diagnose Video Hallucination by Fine-grained Spatial-Temporal Grounding
- Accepted at ICML (Oral, Spotlight) on On Path to Multimodal Generalist: General-Level and General-Bench
- Accepted at ICML on VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models
- Accepted at ICML on SWIFTCODE: Enhancing Code Generation in Large Language Models through Efficiency-Aware Fine-tuning
Research Experience
Conducting research at the Center for Trusted Internet and Community (CTIC) at NUS.
Education
PhD: National University of Singapore, supervised by Prof. Mong Li Lee and Prof. Wynne Hsu; Master's: National University of Singapore; Bachelor's: Wuhan University.
Background
Research interests include Human-Centered AI, Multimodal Understanding, and Multimodal Reasoning. Currently a PhD student at the School of Computing in National University of Singapore.