Robo-Dopamine: General Process Reward Modeling for High-Precision Robotic Manipulation

📅 2025-12-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In real-world robotic reinforcement learning, reward function design remains challenging due to the lack of gait awareness and single-view limitations in existing process reward models (PRMs), coupled with an underdeveloped reward shaping theory that often leads to semantic traps. Method: We propose a gait-aware, multi-view fused process reward modeling framework: (i) introducing step-level reward discretization and multi-view reward fusion; (ii) establishing a policy-invariant reward shaping theoretical framework to provably eliminate misleading optimization. Using over 3,400 hours of multi-view robotic manipulation data, we train a General Reward Model (GRM) integrated with the Dopamine-RL framework and one-step task adaptation. Results: GRM achieves state-of-the-art evaluation accuracy; for novel tasks, only one expert demonstration plus 150 online interactions (~1 hour) suffices to raise policy success rate from ≈0% to 95%, demonstrating strong generalization.

Technology Category

Application Category

📝 Abstract
The primary obstacle for applying reinforcement learning (RL) to real-world robotics is the design of effective reward functions. While recently learning-based Process Reward Models (PRMs) are a promising direction, they are often hindered by two fundamental limitations: their reward models lack step-aware understanding and rely on single-view perception, leading to unreliable assessments of fine-grained manipulation progress; and their reward shaping procedures are theoretically unsound, often inducing a semantic trap that misguides policy optimization. To address these, we introduce Dopamine-Reward, a novel reward modeling method for learning a general-purpose, step-aware process reward model from multi-view inputs. At its core is our General Reward Model (GRM), trained on a vast 3,400+ hour dataset, which leverages Step-wise Reward Discretization for structural understanding and Multi-Perspective Reward Fusion to overcome perceptual limitations. Building upon Dopamine-Reward, we propose Dopamine-RL, a robust policy learning framework that employs a theoretically-sound Policy-Invariant Reward Shaping method, which enables the agent to leverage dense rewards for efficient self-improvement without altering the optimal policy, thereby fundamentally avoiding the semantic trap. Extensive experiments across diverse simulated and real-world tasks validate our approach. GRM achieves state-of-the-art accuracy in reward assessment, and Dopamine-RL built on GRM significantly improves policy learning efficiency. For instance, after GRM is adapted to a new task in a one-shot manner from a single expert trajectory, the resulting reward model enables Dopamine-RL to improve the policy from near-zero to 95% success with only 150 online rollouts (approximately 1 hour of real robot interaction), while retaining strong generalization across tasks. Project website: https://robo-dopamine.github.io
Problem

Research questions and friction points this paper is trying to address.

Designing effective reward functions for real-world robotics reinforcement learning
Overcoming unreliable fine-grained manipulation progress assessment in process reward models
Avoiding semantic traps in reward shaping to ensure optimal policy learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

General Reward Model with step-aware multi-view perception
Policy-Invariant Reward Shaping for efficient policy learning
One-shot adaptation from single expert trajectory
🔎 Similar Papers
No similar papers found.
Huajie Tan
Huajie Tan
Peking University
Embodied AIFoundation Models
S
Sixiang Chen
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
Yijie Xu
Yijie Xu
Hong Kong University of Science and Technology (Guangzhou)
Data MiningNatural Language ProcessingLarge Language Models
Zixiao Wang
Zixiao Wang
University of Science and Technology of China
Yuheng Ji
Yuheng Ji
Institute of Automation, Chinese Academy of Sciences
Embodied AIComputer Vision
Cheng Chi
Cheng Chi
Columbia University, Stanford University
robotics
Y
Yaoxu Lyu
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
Z
Zhongxia Zhao
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
X
Xiansheng Chen
Beijing Academy of Artificial Intelligence
P
Peterson Co
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
S
Shaoxuan Xie
Beijing Academy of Artificial Intelligence
G
Guocai Yao
Beijing Academy of Artificial Intelligence
Pengwei Wang
Pengwei Wang
University of Calgary
Computer Science Security
Z
Zhongyuan Wang
Beijing Academy of Artificial Intelligence
Shanghang Zhang
Shanghang Zhang
Peking University
Embodied AIFoundation Models