Learning Symbolic Persistent Macro-Actions for POMDP Solving Over Time

📅 2025-05-06

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Efficient and interpretable decision-making under temporal uncertainty in partially observable Markov decision processes (POMDPs) remains challenging. Method: We propose an event calculus-driven linear temporal logic (LTL) framework that automatically learns temporally persistent symbolic macro-actions from few belief-action trajectories, integrating event calculus, LTL, and inductive logic programming (ILP) without handcrafted heuristics. The learned macro-actions are embedded into Monte Carlo tree search (MCTS) to drastically reduce inference overhead. Results: Evaluated on Pocman and Rocksample benchmarks, our approach achieves superior expressivity and temporal adaptability compared to static heuristics, while maintaining robustness. It enables end-to-end learning using only the POMDP transition model—requiring no reward or observation models—and simultaneously ensures interpretability, generalization, and computational efficiency.

Technology Category

Application Category

📝 Abstract

This paper proposes an integration of temporal logical reasoning and Partially Observable Markov Decision Processes (POMDPs) to achieve interpretable decision-making under uncertainty with macro-actions. Our method leverages a fragment of Linear Temporal Logic (LTL) based on Event Calculus (EC) to generate emph{persistent} (i.e., constant) macro-actions, which guide Monte Carlo Tree Search (MCTS)-based POMDP solvers over a time horizon, significantly reducing inference time while ensuring robust performance. Such macro-actions are learnt via Inductive Logic Programming (ILP) from a few traces of execution (belief-action pairs), thus eliminating the need for manually designed heuristics and requiring only the specification of the POMDP transition model. In the Pocman and Rocksample benchmark scenarios, our learned macro-actions demonstrate increased expressiveness and generality when compared to time-independent heuristics, indeed offering substantial computational efficiency improvements.

Problem

Research questions and friction points this paper is trying to address.

Integrating temporal logic with POMDPs for interpretable decision-making under uncertainty

Learning persistent macro-actions via ILP to reduce POMDP inference time

Improving computational efficiency in benchmark scenarios using expressive macro-actions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates LTL and POMDPs for interpretable decisions

Uses ILP to learn macro-actions from traces

Employs MCTS with persistent macro-actions for efficiency

🔎 Similar Papers

Revealing the learning process in reinforcement learning agents through attention-oriented metrics