Boosting deep Reinforcement Learning using pretraining with Logical Options

📅 2026-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge that deep reinforcement learning agents often exhibit myopic behavior due to misleading early rewards, hindering their ability to accomplish long-horizon tasks. To overcome this limitation, the authors propose H²RL, a two-stage hybrid framework that first injects symbolic structure into neural policies through logic-based option pretraining, emulating human skill acquisition mechanisms. The policy is then fine-tuned in standard environments to balance goal-directedness with continuous control capabilities. By synergistically integrating the strengths of neural and symbolic approaches, H²RL significantly outperforms purely neural, purely symbolic, and existing neuro-symbolic baselines across multiple long-horizon decision-making tasks, thereby enhancing the agent’s capacity to achieve complex goals.

Technology Category

Application Category

📝 Abstract
Deep reinforcement learning agents are often misaligned, as they over-exploit early reward signals. Recently, several symbolic approaches have addressed these challenges by encoding sparse objectives along with aligned plans. However, purely symbolic architectures are complex to scale and difficult to apply to continuous settings. Hence, we propose a hybrid approach, inspired by humans'ability to acquire new skills. We use a two-stage framework that injects symbolic structure into neural-based reinforcement learning agents without sacrificing the expressivity of deep policies. Our method, called Hybrid Hierarchical RL (H^2RL), introduces a logical option-based pretraining strategy to steer the learning policy away from short-term reward loops and toward goal-directed behavior while allowing the final policy to be refined via standard environment interaction. Empirically, we show that this approach consistently improves long-horizon decision-making and yields agents that outperform strong neural, symbolic, and neuro-symbolic baselines.
Problem

Research questions and friction points this paper is trying to address.

deep reinforcement learning
reward misalignment
long-horizon decision-making
goal-directed behavior
sparse rewards
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid Hierarchical RL
logical options
pretraining
neuro-symbolic reinforcement learning
long-horizon decision-making
🔎 Similar Papers
No similar papers found.