Poke and Strike: Learning Task-Informed Exploration Policies

📅 2025-08-29

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Robots often fail at dynamic tasks—such as striking a puck into an unreachable region—due to unknown physical properties (e.g., friction coefficient, elastic modulus), and struggle to autonomously recover without prior system identification. Method: This paper proposes a task-aware reinforcement learning exploration strategy that constructs an error-driven reward function leveraging gradient sensitivity of a privileged task policy. It integrates uncertainty quantification with an adaptive exploration-execution switching mechanism, enabling the robot to focus exploration on critical physical parameters for efficient online identification. Contribution/Results: Experiments on a KUKA iiwa platform demonstrate an average exploration time of only 1.2 seconds and a task success rate of 90%, substantially outperforming baseline methods (≤40%). To the best of our knowledge, this work presents the first closed-loop validation in real-world settings integrating rapid physical parameter identification with adaptive task execution.

Technology Category

Application Category

📝 Abstract

In many dynamic robotic tasks, such as striking pucks into a goal outside the reachable workspace, the robot must first identify the relevant physical properties of the object for successful task execution, as it is unable to recover from failure or retry without human intervention. To address this challenge, we propose a task-informed exploration approach, based on reinforcement learning, that trains an exploration policy using rewards automatically generated from the sensitivity of a privileged task policy to errors in estimated properties. We also introduce an uncertainty-based mechanism to determine when to transition from exploration to task execution, ensuring sufficient property estimation accuracy with minimal exploration time. Our method achieves a 90% success rate on the striking task with an average exploration time under 1.2 seconds, significantly outperforming baselines that achieve at most 40% success or require inefficient querying and retraining in a simulator at test time. Additionally, we demonstrate that our task-informed rewards capture the relative importance of physical properties in both the striking task and the classical CartPole example. Finally, we validate our approach by demonstrating its ability to identify object properties and adjust task execution in a physical setup using the KUKA iiwa robot arm.

Problem

Research questions and friction points this paper is trying to address.

Learning task-informed exploration for dynamic robotic manipulation

Estimating physical object properties with minimal exploration time

Automating transition from exploration to task execution

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning for task-informed exploration policies

Uncertainty-based transition from exploration to execution

Automated rewards from task policy sensitivity analysis

🔎 Similar Papers

Exploration Implies Data Augmentation: Reachability and Generalisation in Contextual MDPs