Scholar

Rui Shao

Google Scholar ID: 9Vc--XsAAAAJ

Professor, Harbin Institute of Technology (Shenzhen)

Computer VisionMultimodal LLMEmbodied AI

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

2,051

H-index

i10-index

Publications

Co-authors

list available

Contact

GitHubOpen ↗

Publications

31 items

From Abstraction to Instantiation: Learning Behavioral Representation for Vision-Language-Action Model

2026

Cited

ConsisVLA-4D: Advancing Spatiotemporal Consistency in Efficient 3D-Perception and 4D-Reasoning for Robotic Manipulation

2026

Cited

Multimodal Dataset Distillation via Phased Teacher Models

2026

Cited

HATS: Hardness-Aware Trajectory Synthesis for GUI Agents

2026

Cited

$\Delta$VLA: Prior-Guided Vision-Language-Action Models via World Knowledge Variation

2026

Cited

Global Prior Meets Local Consistency: Dual-Memory Augmented Vision-Language-Action Model for Efficient Robotic Manipulation

2026

Cited

Learning to Accelerate Vision-Language-Action Models through Adaptive Visual Token Caching

2026

Cited

Inject Once Survive Later: Backdooring Vision-Language-Action Models to Persist Through Downstream Fine-tuning

2026

Cited

Resume (English only)

Academic Achievements

Published extensively in top-tier venues including NeurIPS, ICCV, CVPR, ACL, ICML, ECCV, TPAMI, IJCV, TNNLS, AAAI, and ACM MM
Multiple papers accepted as Spotlight presentations (e.g., ICML 2025 at 2.6% acceptance rate, ICLR 2025 at 5.1%)
Invited as Area Chair for CVPR 2026
Serving as Panel Co-Chair for ICMR 2025 and Area Chair for BMVC 2025
Led development of the JiuTian series of Multimodal Large Language Models (e.g., JiuTian-LION), published at CVPR 2024
Established the JiuTian-VL GitHub organization to release models and datasets
Made significant contributions in audio-visual MLLMs, GUI agents, robot skill learning, embodied MLLMs, and ego-centric video understanding

Background

Professor at School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)
Leads the multimOdal peRception, reasonIng, and decisiON (Orion) Lab
Research focuses on Multimodal Large Language Model (MLLM)-based intelligent agents capable of perceiving, reasoning, and acting through interaction with the world
Recruiting self-motivated M.S./Ph.D. students for 2026 (3–4 master’s, 2 Ph.D.)
Seeking PostDocs in MLLM and Embodied AI

Co-authors

9 total