ReLAM: Learning Anticipation Model for Rewarding Visual Robotic Manipulation

πŸ“… 2025-09-26
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
In vision-based reinforcement learning, real-world scenarios often lack precise pose feedback, rendering distance-based reward design ineffective. To address this, we propose ReLAMβ€”a novel framework that introduces a keypoint-guided visual imagination model as a planner, implicitly encoding spatial relationships and generating geometrically aligned intermediate sub-goals to construct a structured instructional curriculum. From this curriculum, ReLAM automatically distills dense, continuous rewards with theoretical guarantees of suboptimality. The method requires only action-free demonstration videos and integrates keypoint detection, visual imagination, hierarchical RL, and goal-conditioned policy learning. Evaluated on complex, long-horizon manipulation tasks, ReLAM significantly improves sample efficiency and final performance, surpassing existing state-of-the-art approaches.

Technology Category

Application Category

πŸ“ Abstract
Reward design remains a critical bottleneck in visual reinforcement learning (RL) for robotic manipulation. In simulated environments, rewards are conventionally designed based on the distance to a target position. However, such precise positional information is often unavailable in real-world visual settings due to sensory and perceptual limitations. In this study, we propose a method that implicitly infers spatial distances through keypoints extracted from images. Building on this, we introduce Reward Learning with Anticipation Model (ReLAM), a novel framework that automatically generates dense, structured rewards from action-free video demonstrations. ReLAM first learns an anticipation model that serves as a planner and proposes intermediate keypoint-based subgoals on the optimal path to the final goal, creating a structured learning curriculum directly aligned with the task's geometric objectives. Based on the anticipated subgoals, a continuous reward signal is provided to train a low-level, goal-conditioned policy under the hierarchical reinforcement learning (HRL) framework with provable sub-optimality bound. Extensive experiments on complex, long-horizon manipulation tasks show that ReLAM significantly accelerates learning and achieves superior performance compared to state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

Reward design bottleneck in visual robotic reinforcement learning
Inferring spatial distances from images using keypoint extraction
Automating dense reward generation from action-free video demonstrations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learns anticipation model for subgoal planning
Generates dense rewards from action-free video demonstrations
Uses keypoint-based distance inference for visual RL
πŸ”Ž Similar Papers
No similar papers found.
Nan Tang
Nan Tang
National Institute of Biological Sciences, Beijing
stem cell biologyaginglung diseases
Jing-Cheng Pang
Jing-Cheng Pang
Researcher, Huawei; Nanjing University
reinforcement learninglanguage-conditioned RLlarge language model
G
Guanlin Li
National Key Laboratory for Novel Software Technology, Nanjing University, China; School of Artificial Intelligence, Nanjing University, China
Chao Qian
Chao Qian
Nanjing University
Artificial intelligenceevolutionary algorithmsmachine learning
Y
Yang Yu
National Key Laboratory for Novel Software Technology, Nanjing University, China; School of Artificial Intelligence, Nanjing University, China