VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward Models, CVPR 2025 (Highlight, Top 3%)
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis, CVPR 2025 (Highlight, Top 3%, Most Influential Paper of CVPR 2025)
Temporal Reasoning Transfer from Text to Video, ICLR 2025
VLFeedback: A Large-Scale AI Feedback Dataset for Large Vision-Language Models Alignment, EMNLP 2024
VITATECS: A Diagnostic Dataset for Temporal Concept Understanding of Video-Language Models, ECCV 2024
Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models, ACL 2024
Can Language Models Understand Physical Concepts?, EMNLP 2023
M3IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning, arxiv
A Survey for In-context Learning, EMNLP 2024 (Most Influential Papers of EMNLP 2024)
Background
Research interests include: developing frontier multimodal large language models (e.g., MiMo-VL, Reka Flash); understanding the fundamental mechanisms of LLMs and MLLMs (e.g., In-context Learning, LLMs-as-a-Judge). Currently a PhD student in the HKU-NLP group, co-supervised by Prof. Lingpeng Kong and Prof. Qi Liu. Master's degree from Peking University, advised by Prof. Xu Sun; Bachelor's degree from Xidian University.
Miscellany
Always happy to discuss potential collaborations—feel free to reach out!