Scholar

Pengxiang Ding

Google Scholar ID: QyBSTzEAAAAJ

Zhejiang University

Human Motion PredictionLarge Language ModelEmbodied AI

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

552

H-index

i10-index

Publications

Co-authors

list available

Contact

Emaildingpx2015@gmail.com TwitterOpen ↗GitHubOpen ↗

Publications

42 items

CUBic: Coordinated Unified Bimanual Perception and Control Framework

2026

Cited

CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models

2026

Cited

RoboMemArena: A Comprehensive and Challenging Robotic Memory Benchmark

2026

Cited

MMaDA-VLA: Large Diffusion Vision-Language-Action Model with Unified Multi-Modal Instruction and Generation

2026

Cited

Fast-dVLA: Accelerating Discrete Diffusion VLA to Real-Time Performance

2026

Cited

VAMPO: Policy Optimization for Improving Visual Dynamics in Video Action Models

2026

Cited

Rethinking the Practicality of Vision-language-action Model: A Comprehensive Benchmark and An Improved Baseline

2026

Cited

Embodied Robot Manipulation in the Era of Foundation Models: Planning and Learning Perspectives

2025

Cited

Resume (English only)

Academic Achievements

Published several papers, notable works include:
- Long-VLA: Unleashing Long-Horizon Capability of Vision Language Action Model for Robot Manipulation
- CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction
- CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding
- SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
- Unicorn: Text-Only Data Synthesis for Vision Language Model Training
- Openhelix: A short survey, empirical analysis, and open-source dual-system vla model for robotic manipulation
- Humanoid-vla: Towards universal humanoid control with visual integration
- Accelerating vision-language-action model integrated with action chunking via parallel decoding
- Rethinking Latent Representations in Behavior Cloning: An Information Bottleneck Approach for Robot Manipulation
- QUART-Online: Latency-Free Large Multimodal Language Model for Quadruped Robot Learning
- GEVRM: Goal-Expressive Video Generation Model For Robust Visual Manipulation

Research Experience

Current research focuses on embodied AI, particularly in the VLA direction. Served as the principal investigator or co-investigator in multiple research projects.

Education

Currently a third-year Ph.D. student at Zhejiang University, advised by Prof. Donglin Wang. Also involved in a joint program with Westlake University as a member of Machine Intelligence Laboratory (MiLAB). Prior to his Ph.D., he received his Msc. Degree from the School of Artificial Intelligence, Beijing University of Posts and Telecommunications in 2022, advised by Prof. Jianqin Yin.

Background

Research Interests: Embodied AI, including VLA/VLM/World Model. Mainly focused on the VLA direction. Published 15 papers as first author, co-first author, or project leader.

Miscellany