On the Importance of Multistability for Horizon Generalization in Reinforcement Learning

📅 2026-05-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

230K/year
🤖 AI Summary
This work addresses the challenges of poor temporal horizon generalization, low sample efficiency, and high exploration costs in reinforcement learning within partially observable Markov decision processes (POMDPs). It formally introduces the notion of “temporal horizon generalization” and proposes an analytical framework grounded in dynamical systems theory. The study demonstrates that multistability is a necessary condition for achieving such generalization, while complex tasks further require transient dynamics. Conventional parallelizable recurrent neural networks (RNNs), being inherently monostable, fail to meet these requirements. By incorporating nonlinear, parallelizable recurrent architectures endowed with both multistability and transient capabilities—such as state-space models and gated linear RNNs—the proposed approach significantly enhances policy generalization across time scales, as validated empirically on both simple and complex tasks.
📝 Abstract
In reinforcement learning (RL), agents acting in partially observable Markov decision processes (POMDPs) must rely on memory, typically encoded in a recurrent neural network (RNN), to integrate information from past observations. Long-horizon POMDPs, in which the relevant observation and the optimal action are separated by many time steps (called the horizon), are particularly challenging: training suffers from poor generalization, severe sample inefficiency, and prohibitive exploration costs. Ideally, an agent trained on short horizons would retain optimal behavior at arbitrarily longer ones, but no formal framework currently characterizes when this is achievable. To fill this gap, we formalized temporal horizon generalization, the property that a policy remains optimal for all horizons, derived a necessary and sufficient condition for it, and experimentally evaluated the ability of nonlinear and parallelizable RNN variants to achieve it. This paper presents the resulting theoretical framework, the empirical evaluation, and the dynamical interpretation linking RNN behavior to temporal horizon generalization. Our analyses reveal that multistability is necessary for temporal horizon generalization and, in simple tasks, sufficient; more complex tasks further require transient dynamics. In contrast, modern parallelizable architectures, namely state space models and gated linear RNNs, are monostable by construction and consequently fail to generalize across temporal horizons. We conclude that multistability and transient dynamics are two essential and complementary dynamical regimes for horizon generalization, and that no current parallelizable RNN exhibits both. Designing parallelizable architectures that combine these regimes thus emerges as a key direction for scalable long-horizon RL.
Problem

Research questions and friction points this paper is trying to address.

horizon generalization
multistability
reinforcement learning
POMDP
temporal dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

multistability
temporal horizon generalization
recurrent neural networks
transient dynamics
partially observable MDPs