π€ AI Summary
In non-Markovian environments, reinforcement learning suffers from partial observability and long-range dependencies, rendering the Bellman equation only approximately valid and lacking a theoretical characterization of modelable dynamics. This work introduces cohomology theory from algebraic topology to address this challenge, interpreting temporal difference (TD) errors as 1-cochains over the state-transition space. By applying a Bellmanβde Rham projection, the TD error is decomposed via a Hodge-type decomposition into an exact (integrable) component and a topological residual. Building on this insight, we propose the HodgeFlow Policy Search (HFPS) algorithm, which integrates a potential-function neural network with policy optimization. HFPS achieves significantly improved performance in non-Markovian tasks and comes with theoretical guarantees on stability and sensitivity.
π Abstract
Non-Markovian dynamics are commonly found in real-world environments due to long-range dependencies, partial observability, and memory effects. The Bellman equation that is the central pillar of Reinforcement learning (RL) becomes only approximately valid under Non-Markovian. Existing work often focus on practical algorithm designs and offer limited theoretical treatment to address key questions, such as what dynamics are indeed capturable by the Bellman framework and how to inspire new algorithm classes with optimal approximations. In this paper, we present a novel topological viewpoint on temporal-difference (TD) based RL. We show that TD errors can be viewed as 1-cochain in the topological space of state transitions, while Markov dynamics are then interpreted as topological integrability. This novel view enables us to obtain a Hodge-type decomposition of TD errors into an integrable component and a topological residual, through a Bellman-de Rham projection. We further propose HodgeFlow Policy Search (HFPS) by fitting a potential network to minimize the non-integrable projection residual in RL, achieving stability/sensitivity guarantees. In numerical evaluations, HFPS is shown to significantly improve RL performance under non-Markovian.