ReLAM: Learning Anticipation Model for Rewarding Visual Robotic Manipulation

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

In vision-based reinforcement learning, real-world scenarios often lack precise pose feedback, rendering distance-based reward design ineffective. To address this, we propose ReLAM—a novel framework that introduces a keypoint-guided visual imagination model as a planner, implicitly encoding spatial relationships and generating geometrically aligned intermediate sub-goals to construct a structured instructional curriculum. From this curriculum, ReLAM automatically distills dense, continuous rewards with theoretical guarantees of suboptimality. The method requires only action-free demonstration videos and integrates keypoint detection, visual imagination, hierarchical RL, and goal-conditioned policy learning. Evaluated on complex, long-horizon manipulation tasks, ReLAM significantly improves sample efficiency and final performance, surpassing existing state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract

Reward design remains a critical bottleneck in visual reinforcement learning (RL) for robotic manipulation. In simulated environments, rewards are conventionally designed based on the distance to a target position. However, such precise positional information is often unavailable in real-world visual settings due to sensory and perceptual limitations. In this study, we propose a method that implicitly infers spatial distances through keypoints extracted from images. Building on this, we introduce Reward Learning with Anticipation Model (ReLAM), a novel framework that automatically generates dense, structured rewards from action-free video demonstrations. ReLAM first learns an anticipation model that serves as a planner and proposes intermediate keypoint-based subgoals on the optimal path to the final goal, creating a structured learning curriculum directly aligned with the task's geometric objectives. Based on the anticipated subgoals, a continuous reward signal is provided to train a low-level, goal-conditioned policy under the hierarchical reinforcement learning (HRL) framework with provable sub-optimality bound. Extensive experiments on complex, long-horizon manipulation tasks show that ReLAM significantly accelerates learning and achieves superior performance compared to state-of-the-art methods.

Problem

Research questions and friction points this paper is trying to address.

Reward design bottleneck in visual robotic reinforcement learning

Inferring spatial distances from images using keypoint extraction

Automating dense reward generation from action-free video demonstrations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learns anticipation model for subgoal planning

Generates dense rewards from action-free video demonstrations

Uses keypoint-based distance inference for visual RL

🔎 Similar Papers

Learning Manipulation Skills through Robot Chain-of-Thought with Sparse Failure Guidance