Exploiting Policy Idling for Dexterous Manipulation

📅 2025-08-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In dexterous manipulation, learned policies often exhibit “idling”—oscillatory, non-progressing behavior near target states—leading to poor robustness and inefficient exploration. This paper identifies idling not as a failure mode but as an exploitable exploration signal, and proposes Pause-Induced Perturbations (PIP): an online method that detects idling states in real time and injects lightweight, state-dependent perturbations to stimulate productive exploration and accelerate policy optimization. PIP requires no additional supervision, incurs no training overhead, and relies solely on the existing policy and observed states. Evaluated in multi-task simulated environments, PIP significantly improves policy iteration efficiency and cross-task generalization. On challenging real-world peg-in-hole insertion tasks, it boosts success rates by 15–35%, empirically validating the efficacy and practicality of idling-guided exploration.

Technology Category

Application Category

📝 Abstract
Learning-based methods for dexterous manipulation have made notable progress in recent years. However, learned policies often still lack reliability and exhibit limited robustness to important factors of variation. One failure pattern that can be observed across many settings is that policies idle, i.e. they cease to move beyond a small region of states when they reach certain states. This policy idling is often a reflection of the training data. For instance, it can occur when the data contains small actions in areas where the robot needs to perform high-precision motions, e.g., when preparing to grasp an object or object insertion. Prior works have tried to mitigate this phenomenon e.g. by filtering the training data or modifying the control frequency. However, these approaches can negatively impact policy performance in other ways. As an alternative, we investigate how to leverage the detectability of idling behavior to inform exploration and policy improvement. Our approach, Pause-Induced Perturbations (PIP), applies perturbations at detected idling states, thus helping it to escape problematic basins of attraction. On a range of challenging simulated dual-arm tasks, we find that this simple approach can already noticeably improve test-time performance, with no additional supervision or training. Furthermore, since the robot tends to idle at critical points in a movement, we also find that learning from the resulting episodes leads to better iterative policy improvement compared to prior approaches. Our perturbation strategy also leads to a 15-35% improvement in absolute success rate on a real-world insertion task that requires complex multi-finger manipulation.
Problem

Research questions and friction points this paper is trying to address.

Addresses policy idling in learned dexterous manipulation policies
Mitigates limited robustness and reliability in manipulation tasks
Improves policy performance without additional supervision or training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Detects policy idling states automatically
Applies perturbations to escape idling states
Improves manipulation without extra supervision
🔎 Similar Papers
No similar papers found.