- Long-VLA: Unleashing Long-Horizon Capability of Vision Language Action Model for Robot Manipulation
- CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction
- CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding
- SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
- Unicorn: Text-Only Data Synthesis for Vision Language Model Training
- Openhelix: A short survey, empirical analysis, and open-source dual-system vla model for robotic manipulation
- Humanoid-vla: Towards universal humanoid control with visual integration
- Accelerating vision-language-action model integrated with action chunking via parallel decoding
- Rethinking Latent Representations in Behavior Cloning: An Information Bottleneck Approach for Robot Manipulation
- QUART-Online: Latency-Free Large Multimodal Language Model for Quadruped Robot Learning
- GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation
Research Experience
Current research focuses on embodied AI, particularly in the VLA direction. Served as the principal investigator or co-investigator in multiple research projects.
Education
Currently a third-year Ph.D. student at Zhejiang University, advised by Prof. Donglin Wang. Also involved in a joint program with Westlake University as a member of Machine Intelligence Laboratory (MiLAB). Prior to his Ph.D., he received his Msc. Degree from the School of Artificial Intelligence, Beijing University of Posts and Telecommunications in 2022, advised by Prof. Jianqin Yin.
Background
Research Interests: Embodied AI, including VLA/VLM/World Model. Mainly focused on the VLA direction. Published 15 papers as first author, co-first author, or project leader.