VLA^2: Empowering Vision-Language-Action Models with an Agentic Framework for Unseen Concept Manipulation
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model
Towards a Unified Understanding of Robot Manipulation: A Comprehensive Survey
OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation
CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding
RationalVLA: A Rational Vision-Language-Action Model with Dual System
Unlock Reliable Skill Inference for Quadruped Adaptive Behavior by Skill Graph
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
Research Experience
Conducting research at the Machine Intelligence Lab (MiLAB), focusing on foundation models and reinforcement learning algorithms for robotics.
Education
Bachelor's and Master's degrees in Control Science and Engineering from Beijing University of Posts and Telecommunications (BUPT) in 2020 and 2023, respectively. Currently, a third-year joint Ph.D. student in Computer Science and Technology at Zhejiang University and Westlake University, advised by Prof. Donglin Wang.
Background
Research interests include Embodied Artificial Intelligence, Foundation Models, Reinforcement Learning, and Robotics. Specifically, interested in developing efficient and effective foundation models for robotics and scalable reinforcement learning algorithms.