🤖 AI Summary
This work addresses key limitations of deep reinforcement learning in real-world applications—namely low sample efficiency, poor interpretability, weak cross-task transferability, and sensitivity to environmental changes. The authors propose a large language model (LLM)-driven semantic closed-loop framework that uniquely integrates the general knowledge of LLMs with symbolic planning. By generating executable rules from natural language instructions and semantically annotating automatically discovered options, the framework enables semantic-guided skill reuse and dynamic constraint monitoring. Experimental results on Office World and Montezuma’s Revenge demonstrate significant improvements in sample efficiency, adherence to task constraints, and cross-task adaptability.
📝 Abstract
Despite achieving remarkable success in complex tasks, Deep Reinforcement Learning (DRL) is still suffering from critical issues in practical applications, such as low data efficiency, lack of interpretability, and limited cross-environment transferability. However, the learned policy generating actions based on states are sensitive to the environmental changes, struggling to guarantee behavioral safety and compliance. Recent research shows that integrating Large Language Models (LLMs) with symbolic planning is promising in addressing these challenges. Inspired by this, we introduce a novel LLM-driven closed-loop framework, which enables semantic-driven skill reuse and real-time constraint monitoring by mapping natural language instructions into executable rules and semantically annotating automatically created options. The proposed approach utilizes the general knowledge of LLMs to facilitate exploration efficiency and adapt to transferable options for similar environments, and provides inherent interpretability through semantic annotations. To validate the effectiveness of this framework, we conduct experiments on two domains, Office World and Montezuma's Revenge, respectively. The results demonstrate superior performance in data efficiency, constraint compliance, and cross-task transferability.