ProcVLM: Learning Procedure-Grounded Progress Rewards for Robotic Manipulation

📅 2026-05-09
📈 Citations: 0
Influential: 0
📄 PDF

career value

184K/year
🤖 AI Summary
Long-horizon robotic manipulation often lacks dense reward signals aligned with task procedures, causing existing methods to misinterpret mere time progression as task advancement and struggle to detect stagnation or failure. To address this, this work proposes ProcVLM, a program-structure-guided vision-language model that introduces a novel paradigm: first reasoning about the remaining atomic actions and then estimating task progress. ProcVLM integrates subtask semantic annotations, a visual-change-driven progress allocation mechanism, and joint pretraining for action segmentation and future planning. Leveraging a newly curated large-scale program-aware dataset, ProcCorpus-60M, along with the ProcVQA benchmark, ProcVLM significantly outperforms baseline methods in progress estimation and reward modeling, yielding more discriminative dense reward signals that effectively enhance downstream policy optimization.
📝 Abstract
Long-horizon robotic manipulation requires dense feedback that reflects how a task advances through its procedural stages, not merely whether the final outcome is successful. Existing reward models often rely on trajectory-level success labels or time-based interpolation, which can conflate elapsed time with true task progress and therefore fail to capture unfinished steps, stagnation, and failure states. We present ProcVLM, a progress-aware vision-language model that learns procedure-grounded progress as a dense reward signal for manipulation. Rather than deriving progress from terminal outcomes or temporal proxies, ProcVLM grounds progress estimation in procedural structure and intra-stage visual change, and further adopts a reasoning-before-estimation paradigm that infers the remaining atomic actions before estimating task progress. Specifically, we construct this supervision by synthesizing frame-level subtask-semantic annotations, assigning progress budgets according to subtask structure, and distributing each budget based on intra-subtask visual change. To train ProcVLM at scale, we build a standardized procedural supervision synthesis pipeline and construct ProcCorpus-60M from 30 embodied datasets with 60M annotated frames, from which we derive ProcVQA for procedure-aware pretraining, with progress estimation as the central task alongside action segmentation and future planning. Experiments on ProcVQA and reward-model benchmarks show that ProcVLM improves embodied procedural reasoning and yields more discriminative trajectory-internal progress estimates than representative baselines, supporting its use as a dense reward model for downstream reward-guided policy optimization. Project page: https://procvlm.github.io/
Problem

Research questions and friction points this paper is trying to address.

robotic manipulation
dense reward
task progress
procedural reasoning
long-horizon tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

procedure-grounded reward
progress estimation
vision-language model
robotic manipulation
dense reward signal
Y
Youhe Feng
School of Information, Renmin University of China
H
Hansen Shi
School of Information, Renmin University of China
H
Haoyang Li
School of Information, Renmin University of China
X
Xinlei Guo
School of Information, Renmin University of China
Y
Yang Wang
School of Information, Renmin University of China
C
Chengyang Zhang
School of Information, Renmin University of China
J
Jinkai Zhang
School of Information, Renmin University of China
X
Xiaohan Zhang
Zhipu AI
Jie Tang
Jie Tang
UW Madison
Computed Tomography
Jing Zhang
Jing Zhang
Renmin University of China
large model alignmentmodel compression & inference optimizationdata intelligence