🤖 AI Summary
To address the challenge of highly strategy-dependent tool longevity under uncertain task requirements and inaccessible environments, this paper proposes a reinforcement learning framework that jointly optimizes task completion and tool longevity. The method explicitly incorporates tool remaining useful life (RUL)—estimated via finite element analysis and Miner’s rule—as a reward signal, and introduces an adaptive reward normalization mechanism to mitigate training instability caused by delayed RUL feedback. In simulation, the approach achieves up to an 8.01× improvement in tool lifespan. Furthermore, it successfully transfers to a real robotic platform, demonstrating effectiveness and practicality on tasks including screw driving and surface scraping. To the best of our knowledge, this is the first work to achieve co-optimization of task performance and physical durability in general-purpose tool manipulation.
📝 Abstract
In inaccessible environments with uncertain task demands, robots often rely on general-purpose tools that lack predefined usage strategies. These tools are not tailored for particular operations, making their longevity highly sensitive to how they are used. This creates a fundamental challenge: how can a robot learn a tool-use policy that both completes the task and prolongs the tool's lifespan? In this work, we address this challenge by introducing a reinforcement learning (RL) framework that incorporates tool lifespan as a factor during policy optimization. Our framework leverages Finite Element Analysis (FEA) and Miner's Rule to estimate Remaining Useful Life (RUL) based on accumulated stress, and integrates the RUL into the RL reward to guide policy learning toward lifespan-guided behavior. To handle the fact that RUL can only be estimated after task execution, we introduce an Adaptive Reward Normalization (ARN) mechanism that dynamically adjusts reward scaling based on estimated RULs, ensuring stable learning signals. We validate our method across simulated and real-world tool use tasks, including Object-Moving and Door-Opening with multiple general-purpose tools. The learned policies consistently prolong tool lifespan (up to 8.01x in simulation) and transfer effectively to real-world settings, demonstrating the practical value of learning lifespan-guided tool use strategies.