Structuring Value Representations via Geometric Coherence in Markov Decision Processes

📅 2026-02-03

📈 Citations: 1

✨ Influential: 0

career value

203K/year

🤖 AI Summary

This work addresses the challenges of instability and low sample efficiency in value function estimation within reinforcement learning by introducing, for the first time, an order-theoretic perspective. The authors formulate value learning as a partially ordered set (poset) learning problem and propose the GCR-RL framework, which progressively refines a hyper-poset structure guided by temporal difference signals to ensure geometric consistency in value representations. Building on this foundation, they develop two novel algorithms compatible with both Q-learning and Actor-Critic architectures, accompanied by theoretical convergence guarantees. Empirical evaluations demonstrate that the proposed approach significantly improves sample efficiency and training stability across a variety of tasks, outperforming several strong baselines.

Technology Category

Application Category

📝 Abstract

Geometric properties can be leveraged to stabilize and speed reinforcement learning. Existing examples include encoding symmetry structure, geometry-aware data augmentation, and enforcing structural restrictions. In this paper, we take a novel view of RL through the lens of order theory and recast value function estimates into learning a desired poset (partially ordered set). We propose \emph{GCR-RL} (Geometric Coherence Regularized Reinforcement Learning) that computes a sequence of super-poset refinements -- by refining posets in previous steps and learning additional order relationships from temporal difference signals -- thus ensuring geometric coherence across the sequence of posets underpinning the learned value functions. Two novel algorithms by Q-learning and by actor--critic are developed to efficiently realize these super-poset refinements. Their theoretical properties and convergence rates are analyzed. We empirically evaluate GCR-RL in a range of tasks and demonstrate significant improvements in sample efficiency and stable performance over strong baselines.

Problem

Research questions and friction points this paper is trying to address.

value representation

geometric coherence

Markov Decision Processes

reinforcement learning

partially ordered set

Innovation

Methods, ideas, or system contributions that make the work stand out.

Geometric Coherence

Partially Ordered Set (Poset)

Super-poset Refinement