Exploiting Hybrid Policy in Reinforcement Learning for Interpretable Temporal Logic Manipulation

📅 2024-10-14

🏛️ IEEE/RJS International Conference on Intelligent RObots and Systems

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

To address low sample efficiency and semantic ambiguity in reinforcement learning for long-horizon robotic manipulation tasks—which often lead to slow convergence or failure—this paper proposes HyTL, a temporal-logic-guided, three-level decoupled hybrid policy framework. HyTL innovatively encodes task specifications using Linear Temporal Logic (LTL) and decomposes policy learning into three hierarchical levels: high-level waypoint planning, mid-level behavioral primitive selection, and low-level parametric execution, with feedback-driven co-optimization across layers. Compared to conventional hierarchical RL, HyTL significantly improves policy interpretability and exploration efficiency. Experiments on four complex manipulation tasks demonstrate that HyTL achieves an average 32.7% improvement in task success rate and reduces convergence steps by 41.5%, while generating human-readable, logic-grounded decision rationales.

Technology Category

Application Category

📝 Abstract

Reinforcement Learning (RL) based methods have been increasingly explored for robot learning. However, RL based methods often suffer from low sampling efficiency in the exploration phase, especially for long-horizon manipulation tasks, and generally neglect the semantic information from the task level, resulted in a delayed convergence or even tasks failure. To tackle these challenges, we propose a Temporal-Logic-guided Hybrid policy framework (HyTL) which leverages three-level decision layers to improve the agent’s performance. Specifically, the task specifications are encoded via linear temporal logic (LTL) to improve performance and offer interpretability. And a waypoints planning module is designed with the feedback from the LTL-encoded task level as a high-level policy to improve the exploration efficiency. The middle-level policy selects which behavior primitives to execute, and the low-level policy specifies the corresponding parameters to interact with the environment. We evaluate HyTL on four challenging manipulation tasks, which demonstrate its effectiveness and interpretability. Our project is available at: https://sites.google.com/view/hytl-0257/.

Problem

Research questions and friction points this paper is trying to address.

Reinforcement Learning

Information Gathering

Learning Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

HyTL

Reinforcement Learning

Linear Temporal Logic

🔎 Similar Papers

On Generating Explanations for Reinforcement Learning Policies: An Empirical Study