Neural Co-state Policies: Structuring Hidden States in Recurrent Reinforcement Learning

📅 2026-05-06

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This work addresses the insufficient robustness of policies in partially observable environments caused by the uninterpretable nature of hidden states in recurrent reinforcement learning. It establishes, for the first time, a correspondence between the hidden states of recurrent policies and the adjoint (costate) variables in Pontryagin’s Minimum Principle (PMP). By introducing a PMP-based costate loss, the method explicitly constrains the internal dynamics of the network, rendering the readout layer interpretable as performing Hamiltonian minimization. Evaluated on partially observable continuous-control tasks from the DeepMind Control Suite, the proposed approach matches or exceeds current state-of-the-art baselines and demonstrates significantly enhanced robustness under zero-shot out-of-domain sensor occlusion.

📝 Abstract

A key capability of intelligent agents is operating under partial observability: reasoning and acting effectively despite missing or incomplete state observations. While recurrent (memory-based) policies learned via reinforcement learning address this by encoding history into latent state representations, their internal dynamics remain uninterpretable black boxes. This paper establishes a formal link between these hidden states and the Pontryagin minimum principle (PMP) from optimal control. We demonstrate that for standard recurrent architectures, latent representations map directly to PMP co-states, which allows the readout layer to be interpreted as performing Hamiltonian minimization. Because standard reward maximization does not naturally discover this alignment, we introduce a PMP-derived co-state loss to explicitly structure the internal dynamics. Empirically, this approach matches or improves performance on partially observable DMControl tasks, and is robust against zero-shot out-of-distribution sensor masking. By framing recurrent networks as dynamic processes governed by the minimum principle, we provide a principled approach to designing robust continuous control policies.

Problem

Research questions and friction points this paper is trying to address.

partial observability

recurrent reinforcement learning

hidden states

Pontryagin minimum principle

co-states

Innovation

Methods, ideas, or system contributions that make the work stand out.

co-state

Pontryagin minimum principle

recurrent reinforcement learning