Structuring Value Representations via Geometric Coherence in Markov Decision Processes

📅 2026-02-03
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of instability and low sample efficiency in value function estimation within reinforcement learning by introducing, for the first time, an order-theoretic perspective. The authors formulate value learning as a partially ordered set (poset) learning problem and propose the GCR-RL framework, which progressively refines a hyper-poset structure guided by temporal difference signals to ensure geometric consistency in value representations. Building on this foundation, they develop two novel algorithms compatible with both Q-learning and Actor-Critic architectures, accompanied by theoretical convergence guarantees. Empirical evaluations demonstrate that the proposed approach significantly improves sample efficiency and training stability across a variety of tasks, outperforming several strong baselines.

Technology Category

Application Category

📝 Abstract
Geometric properties can be leveraged to stabilize and speed reinforcement learning. Existing examples include encoding symmetry structure, geometry-aware data augmentation, and enforcing structural restrictions. In this paper, we take a novel view of RL through the lens of order theory and recast value function estimates into learning a desired poset (partially ordered set). We propose \emph{GCR-RL} (Geometric Coherence Regularized Reinforcement Learning) that computes a sequence of super-poset refinements -- by refining posets in previous steps and learning additional order relationships from temporal difference signals -- thus ensuring geometric coherence across the sequence of posets underpinning the learned value functions. Two novel algorithms by Q-learning and by actor--critic are developed to efficiently realize these super-poset refinements. Their theoretical properties and convergence rates are analyzed. We empirically evaluate GCR-RL in a range of tasks and demonstrate significant improvements in sample efficiency and stable performance over strong baselines.
Problem

Research questions and friction points this paper is trying to address.

value representation
geometric coherence
Markov Decision Processes
reinforcement learning
partially ordered set
Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometric Coherence
Partially Ordered Set (Poset)
Super-poset Refinement
Order Theory
Reinforcement Learning