Published extensively in top-tier venues including NeurIPS, ICCV, CVPR, ACL, ICML, ECCV, TPAMI, IJCV, TNNLS, AAAI, and ACM MM
Multiple papers accepted as Spotlight presentations (e.g., ICML 2025 at 2.6% acceptance rate, ICLR 2025 at 5.1%)
Invited as Area Chair for CVPR 2026
Serving as Panel Co-Chair for ICMR 2025 and Area Chair for BMVC 2025
Led development of the JiuTian series of Multimodal Large Language Models (e.g., JiuTian-LION), published at CVPR 2024
Established the JiuTian-VL GitHub organization to release models and datasets
Made significant contributions in audio-visual MLLMs, GUI agents, robot skill learning, embodied MLLMs, and ego-centric video understanding
Background
Professor at School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)
Leads the multimOdal peRception, reasonIng, and decisiON (Orion) Lab
Research focuses on Multimodal Large Language Model (MLLM)-based intelligent agents capable of perceiving, reasoning, and acting through interaction with the world
Recruiting self-motivated M.S./Ph.D. students for 2026 (3–4 master’s, 2 Ph.D.)