PALM: Progress-Aware Policy Learning via Affordance Reasoning for Long-Horizon Robotic Manipulation

📅 2026-01-11

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Existing vision–language–action models struggle to identify task-relevant interaction cues and track subtask progress in long-horizon, multi-step robotic tasks, often leading to errors such as repetition, omission, or premature termination. To address this, this work proposes an interaction-centric policy learning framework that unifies perception and execution by integrating an affordance representation encompassing object relevance, contact geometry, spatial layout, and motion dynamics, along with a continuous subtask progress prediction mechanism. The proposed approach achieves a 91.8% success rate on LIBERO-LONG, increases the average episode length by 12.5% on the CALVIN ABC→D benchmark, and demonstrates a twofold performance improvement across three real-world long-horizon generalization tasks.

Technology Category

Application Category

📝 Abstract

Recent advancements in vision-language-action (VLA) models have shown promise in robotic manipulation, yet they continue to struggle with long-horizon, multi-step tasks. Existing methods lack internal reasoning mechanisms that can identify task-relevant interaction cues or track progress within a subtask, leading to critical execution errors such as repeated actions, missed steps, and premature termination. To address these challenges, we introduce PALM, a VLA framework that structures policy learning around interaction-centric affordance reasoning and subtask progress cues. PALM distills complementary affordance representations that capture object relevance, contact geometry, spatial placements, and motion dynamics, and serve as task-relevant anchors for visuomotor control. To further stabilize long-horizon execution, PALM predicts continuous within-subtask progress, enabling seamless subtask transitions. Across extensive simulation and real-world experiments, PALM consistently outperforms baselines, achieving a 91.8% success rate on LIBERO-LONG, a 12.5% improvement in average length on CALVIN ABC->D, and a 2x improvement over real-world baselines across three long-horizon generalization settings.

Problem

Research questions and friction points this paper is trying to address.

long-horizon manipulation

multi-step tasks

progress tracking

affordance reasoning

execution errors

Innovation

Methods, ideas, or system contributions that make the work stand out.

affordance reasoning

progress-aware policy learning

long-horizon manipulation

vision-language-action models

subtask progress tracking

🔎 Similar Papers

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey

2024-04-28arXiv.orgCitations: 15

VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-horizon Manipulation

2024-05-26arXiv.orgCitations: 0