Scholar

Han Zhao

Google Scholar ID: sowTlrsAAAAJ

Zhejiang University | Westlake University

Embodied intelligenceReinforcement learningMultimodal large language modelsControl theory

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

326

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailzhaohan34@westlake.edu.cn CVOpen ↗TwitterOpen ↗GitHubOpen ↗

Publications

8 items

CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models

2026

Cited

RoboMemArena: A Comprehensive and Challenging Robotic Memory Benchmark

2026

Cited

Fast-dVLA: Accelerating Discrete Diffusion VLA to Real-Time Performance

2026

Cited

MMaDA-VLA: Large Diffusion Vision-Language-Action Model with Unified Multi-Modal Instruction and Generation

2026

Cited

VAMPO: Policy Optimization for Improving Visual Dynamics in Video Action Models

2026

Cited

Rethinking the Practicality of Vision-language-action Model: A Comprehensive Benchmark and An Improved Baseline

2026

Cited

FRAPPE: Infusing World Modeling into Generalist Policies via Multiple Future Representation Alignment

2026

Cited

CRL-VLA: Continual Vision-Language-Action Learning

2026

Cited

Resume (English only)

Academic Achievements

VLA^2: Empowering Vision-Language-Action Models with an Agentic Framework for Unseen Concept Manipulation
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model
Towards a Unified Understanding of Robot Manipulation: A Comprehensive Survey
OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation
CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding
RationalVLA: A Rational Vision-Language-Action Model with Dual System
Unlock Reliable Skill Inference for Quadruped Adaptive Behavior by Skill Graph
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning

Research Experience

Conducting research at the Machine Intelligence Lab (MiLAB), focusing on foundation models and reinforcement learning algorithms for robotics.

Education

Bachelor's and Master's degrees in Control Science and Engineering from Beijing University of Posts and Telecommunications (BUPT) in 2020 and 2023, respectively. Currently, a third-year joint Ph.D. student in Computer Science and Technology at Zhejiang University and Westlake University, advised by Prof. Donglin Wang.

Background

Research interests include Embodied Artificial Intelligence, Foundation Models, Reinforcement Learning, and Robotics. Specifically, interested in developing efficient and effective foundation models for robotics and scalable reinforcement learning algorithms.

Co-authors

12 total