LLM-assisted Semantic Option Discovery for Facilitating Adaptive Deep Reinforcement Learning

📅 2026-03-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses key limitations of deep reinforcement learning in real-world applications—namely low sample efficiency, poor interpretability, weak cross-task transferability, and sensitivity to environmental changes. The authors propose a large language model (LLM)-driven semantic closed-loop framework that uniquely integrates the general knowledge of LLMs with symbolic planning. By generating executable rules from natural language instructions and semantically annotating automatically discovered options, the framework enables semantic-guided skill reuse and dynamic constraint monitoring. Experimental results on Office World and Montezuma’s Revenge demonstrate significant improvements in sample efficiency, adherence to task constraints, and cross-task adaptability.

Technology Category

Application Category

📝 Abstract
Despite achieving remarkable success in complex tasks, Deep Reinforcement Learning (DRL) is still suffering from critical issues in practical applications, such as low data efficiency, lack of interpretability, and limited cross-environment transferability. However, the learned policy generating actions based on states are sensitive to the environmental changes, struggling to guarantee behavioral safety and compliance. Recent research shows that integrating Large Language Models (LLMs) with symbolic planning is promising in addressing these challenges. Inspired by this, we introduce a novel LLM-driven closed-loop framework, which enables semantic-driven skill reuse and real-time constraint monitoring by mapping natural language instructions into executable rules and semantically annotating automatically created options. The proposed approach utilizes the general knowledge of LLMs to facilitate exploration efficiency and adapt to transferable options for similar environments, and provides inherent interpretability through semantic annotations. To validate the effectiveness of this framework, we conduct experiments on two domains, Office World and Montezuma's Revenge, respectively. The results demonstrate superior performance in data efficiency, constraint compliance, and cross-task transferability.
Problem

Research questions and friction points this paper is trying to address.

Deep Reinforcement Learning
data efficiency
interpretability
transferability
behavioral safety
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-assisted semantic option discovery
adaptive deep reinforcement learning
semantic skill reuse
real-time constraint monitoring
cross-environment transferability
🔎 Similar Papers
No similar papers found.