Neural Co-state Policies: Structuring Hidden States in Recurrent Reinforcement Learning

πŸ“… 2026-05-06
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

250K/year
πŸ€– AI Summary
This work addresses the insufficient robustness of policies in partially observable environments caused by the uninterpretable nature of hidden states in recurrent reinforcement learning. It establishes, for the first time, a correspondence between the hidden states of recurrent policies and the adjoint (costate) variables in Pontryagin’s Minimum Principle (PMP). By introducing a PMP-based costate loss, the method explicitly constrains the internal dynamics of the network, rendering the readout layer interpretable as performing Hamiltonian minimization. Evaluated on partially observable continuous-control tasks from the DeepMind Control Suite, the proposed approach matches or exceeds current state-of-the-art baselines and demonstrates significantly enhanced robustness under zero-shot out-of-domain sensor occlusion.
πŸ“ Abstract
A key capability of intelligent agents is operating under partial observability: reasoning and acting effectively despite missing or incomplete state observations. While recurrent (memory-based) policies learned via reinforcement learning address this by encoding history into latent state representations, their internal dynamics remain uninterpretable black boxes. This paper establishes a formal link between these hidden states and the Pontryagin minimum principle (PMP) from optimal control. We demonstrate that for standard recurrent architectures, latent representations map directly to PMP co-states, which allows the readout layer to be interpreted as performing Hamiltonian minimization. Because standard reward maximization does not naturally discover this alignment, we introduce a PMP-derived co-state loss to explicitly structure the internal dynamics. Empirically, this approach matches or improves performance on partially observable DMControl tasks, and is robust against zero-shot out-of-distribution sensor masking. By framing recurrent networks as dynamic processes governed by the minimum principle, we provide a principled approach to designing robust continuous control policies.
Problem

Research questions and friction points this paper is trying to address.

partial observability
recurrent reinforcement learning
hidden states
Pontryagin minimum principle
co-states
Innovation

Methods, ideas, or system contributions that make the work stand out.

co-state
Pontryagin minimum principle
recurrent reinforcement learning
partial observability
structured latent dynamics