Exploiting Hybrid Policy in Reinforcement Learning for Interpretable Temporal Logic Manipulation

📅 2024-10-14
🏛️ IEEE/RJS International Conference on Intelligent RObots and Systems
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address low sample efficiency and semantic ambiguity in reinforcement learning for long-horizon robotic manipulation tasks—which often lead to slow convergence or failure—this paper proposes HyTL, a temporal-logic-guided, three-level decoupled hybrid policy framework. HyTL innovatively encodes task specifications using Linear Temporal Logic (LTL) and decomposes policy learning into three hierarchical levels: high-level waypoint planning, mid-level behavioral primitive selection, and low-level parametric execution, with feedback-driven co-optimization across layers. Compared to conventional hierarchical RL, HyTL significantly improves policy interpretability and exploration efficiency. Experiments on four complex manipulation tasks demonstrate that HyTL achieves an average 32.7% improvement in task success rate and reduces convergence steps by 41.5%, while generating human-readable, logic-grounded decision rationales.

Technology Category

Application Category

📝 Abstract
Reinforcement Learning (RL) based methods have been increasingly explored for robot learning. However, RL based methods often suffer from low sampling efficiency in the exploration phase, especially for long-horizon manipulation tasks, and generally neglect the semantic information from the task level, resulted in a delayed convergence or even tasks failure. To tackle these challenges, we propose a Temporal-Logic-guided Hybrid policy framework (HyTL) which leverages three-level decision layers to improve the agent’s performance. Specifically, the task specifications are encoded via linear temporal logic (LTL) to improve performance and offer interpretability. And a waypoints planning module is designed with the feedback from the LTL-encoded task level as a high-level policy to improve the exploration efficiency. The middle-level policy selects which behavior primitives to execute, and the low-level policy specifies the corresponding parameters to interact with the environment. We evaluate HyTL on four challenging manipulation tasks, which demonstrate its effectiveness and interpretability. Our project is available at: https://sites.google.com/view/hytl-0257/.
Problem

Research questions and friction points this paper is trying to address.

Reinforcement Learning
Information Gathering
Learning Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

HyTL
Reinforcement Learning
Linear Temporal Logic
🔎 Similar Papers
H
Hao Zhang
Department of Automation, University of Science and Technology of China, Hefei, Anhui, China, 230026
H
Hao Wang
Department of Automation, University of Science and Technology of China, Hefei, Anhui, China, 230026
X
Xiucai Huang
School of Automation, Chongqing University, Chongqing, China
Wenrui Chen
Wenrui Chen
Hunan University
RoboticsHandsGraspingDexterous ManipulationHuman-Robot Collaboration
Zhen Kan
Zhen Kan
University of Science and Technology of China
Nonlinear ControlFormal MethodsRoboticsLearning Systems