🤖 AI Summary
This study investigates the design of prior preference distributions in active inference agents and their impact on inference and learning. Focusing on a grid-world navigation task, we systematically compare four configurations: soft versus hard goal specifications, with and without goal shaping (i.e., intermediate targets). Within the variational inference framework grounded in expected free energy minimization, goal-directed behavior is formalized via KL divergence between the agent’s beliefs and target preferences. Results show that goal shaping markedly improves task completion performance but concurrently impairs the agent’s ability to learn the true environmental state-transition dynamics—revealing a novel exploration–exploitation trade-off. Crucially, this work provides the first quantitative characterization of how structured preferences inhibit dynamic model learning, thereby establishing theoretical foundations and practical guidelines for designing interpretable, controllable goals in active inference systems.
📝 Abstract
Active inference proposes expected free energy as an objective for planning and decision-making to adequately balance exploitative and explorative drives in learning agents. The exploitative drive, or what an agent wants to achieve, is formalised as the Kullback-Leibler divergence between a variational probability distribution, updated at each inference step, and a preference probability distribution that indicates what states or observations are more likely for the agent, hence determining the agent's goal in a certain environment. In the literature, the questions of how the preference distribution should be specified and of how a certain specification impacts inference and learning in an active inference agent have been given hardly any attention. In this work, we consider four possible ways of defining the preference distribution, either providing the agents with hard or soft goals and either involving or not goal shaping (i.e., intermediate goals). We compare the performances of four agents, each given one of the possible preference distributions, in a grid world navigation task. Our results show that goal shaping enables the best performance overall (i.e., it promotes exploitation) while sacrificing learning about the environment's transition dynamics (i.e., it hampers exploration).