🤖 AI Summary
Subgoal discovery in reinforcement learning suffers from difficulties in identifying meaningful subgoals, low sample efficiency, and reliance on handcrafted reward shaping informed by domain priors. Method: This paper proposes an unsupervised subgoal discovery framework grounded in the Free Energy Principle (FEP). It constructs a simplified model via state-space aggregation and quantifies model discrepancy between the original and aggregated spaces using neighborhood transition divergence. Subgoals are automatically identified as states exhibiting peak unpredictability—i.e., local maxima in transition entropy—thereby eliminating heuristic assumptions based on distance or coverage. Contribution/Results: To our knowledge, this is the first work to formalize subgoal discovery through free energy minimization, requiring no task-specific prior knowledge. Evaluated on diverse stochastic grid-world navigation tasks, the method achieves stable and robust subgoal extraction, significantly accelerating policy convergence and improving generalization—especially under high observation noise and stochastic transition dynamics.
📝 Abstract
Reinforcement learning (RL) plays a major role in solving complex sequential decision-making tasks. Hierarchical and goal-conditioned RL are promising methods for dealing with two major problems in RL, namely sample inefficiency and difficulties in reward shaping. These methods tackle the mentioned problems by decomposing a task into simpler subtasks and temporally abstracting a task in the action space. One of the key components for task decomposition of these methods is subgoal discovery. We can use the subgoal states to define hierarchies of actions and also use them in decomposing complex tasks. Under the assumption that subgoal states are more unpredictable, we propose a free energy paradigm to discover them. This is achieved by using free energy to select between two spaces, the main space and an aggregation space. The $model ; changes$ from neighboring states to a given state shows the unpredictability of a given state, and therefore it is used in this paper for subgoal discovery. Our empirical results on navigation tasks like grid-world environments show that our proposed method can be applied for subgoal discovery without prior knowledge of the task. Our proposed method is also robust to the stochasticity of environments.