Task-Oriented Grasping Using Reinforcement Learning with a Contextual Reward Machine

📅 2025-12-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

209K/year
🤖 AI Summary
Low success rates and poor learning efficiency in task-oriented grasping arise from object diversity, complex functional attributes, and highly variable grasp topologies. To address these challenges, this paper proposes a hierarchical reinforcement learning framework centered on a Contextual Reward Machine (CRM). The framework decomposes tasks into multi-stage subtasks, each equipped with a customized reward function, action space, and state abstraction; it further introduces learnable stage-transition rewards to optimize policy sequencing. Key contributions include: (i) the first use of CRM to decouple tasks and compress state-action spaces, and (ii) explicit modeling of stage transitions to constrain exploration boundaries and enhance robustness. Evaluated on 1,000 simulated tasks, the method achieves a 95% success rate; on physical hardware, it attains 83.3% success across 60 trials involving six functional grasps. It significantly outperforms state-of-the-art approaches, demonstrating high precision, superior data efficiency, and strong generalization capability.

Technology Category

Application Category

📝 Abstract
This paper presents a reinforcement learning framework that incorporates a Contextual Reward Machine for task-oriented grasping. The Contextual Reward Machine reduces task complexity by decomposing grasping tasks into manageable sub-tasks. Each sub-task is associated with a stage-specific context, including a reward function, an action space, and a state abstraction function. This contextual information enables efficient intra-stage guidance and improves learning efficiency by reducing the state-action space and guiding exploration within clearly defined boundaries. In addition, transition rewards are introduced to encourage or penalize transitions between stages which guides the model toward desirable stage sequences and further accelerates convergence. When integrated with the Proximal Policy Optimization algorithm, the proposed method achieved a 95% success rate across 1,000 simulated grasping tasks encompassing diverse objects, affordances, and grasp topologies. It outperformed the state-of-the-art methods in both learning speed and success rate. The approach was transferred to a real robot, where it achieved a success rate of 83.3% in 60 grasping tasks over six affordances. These experimental results demonstrate superior accuracy, data efficiency, and learning efficiency. They underscore the model's potential to advance task-oriented grasping in both simulated and real-world settings.
Problem

Research questions and friction points this paper is trying to address.

Develops reinforcement learning for task-oriented robotic grasping
Reduces task complexity via contextual reward machine decomposition
Improves learning efficiency and success rates in simulations and real robots
Innovation

Methods, ideas, or system contributions that make the work stand out.

Contextual Reward Machine decomposes grasping tasks into sub-tasks
Stage-specific contexts guide exploration and reduce state-action space
Transition rewards accelerate convergence by encouraging desirable stage sequences