🤖 AI Summary
To address the low exploration efficiency and weak goal completion capability of Monte Carlo Tree Search (MCTS) in continuous-control tasks under uncertainty, this paper proposes the first method embedding the free-energy minimization principle from active inference into the MCTS framework. By unifying external reward maximization and cognitive uncertainty (i.e., information gain) within a single objective, the approach inherently balances exploration and exploitation. It employs the Cross-Entropy Method (CEM) to optimize root-node actions and introduces a tree expansion mechanism that jointly incorporates reward modeling and intrinsic exploration rewards. Experiments across multiple continuous-control benchmarks demonstrate that the method significantly outperforms standalone CEM and standard MCTS with random rollouts. It achieves superior trade-offs among planning consistency, task success rate, and computational efficiency—marking a principled advance in uncertainty-aware decision-making for continuous control.
📝 Abstract
Active Inference, grounded in the Free Energy Principle, provides a powerful lens for understanding how agents balance exploration and goal-directed behavior in uncertain environments. Here, we propose a new planning framework, that integrates Monte Carlo Tree Search (MCTS) with active inference objectives to systematically reduce epistemic uncertainty while pursuing extrinsic rewards. Our key insight is that MCTS already renowned for its search efficiency can be naturally extended to incorporate free energy minimization by blending expected rewards with information gain. Concretely, the Cross-Entropy Method (CEM) is used to optimize action proposals at the root node, while tree expansions leverage reward modeling alongside intrinsic exploration bonuses. This synergy allows our planner to maintain coherent estimates of value and uncertainty throughout planning, without sacrificing computational tractability. Empirically, we benchmark our planner on a diverse set of continuous control tasks, where it demonstrates performance gains over both standalone CEM and MCTS with random rollouts.