Shift Before You Learn: Enabling Low-Rank Representations in Reinforcement Learning

📅 2025-09-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work challenges the implicit low-rank assumption of successor measures in reinforcement learning—particularly under reward-agnostic and goal-conditioned settings—demonstrating that raw successor measures are not inherently low-rank. Method: We show that a natural, structurally low-rank approximation emerges after applying an initial dynamic shift derived from transition dynamics. Leveraging this insight, we introduce a family of Type II Poincaré inequalities, establishing the first theoretical link among shift magnitude, higher-order singular value decay, and local mixing properties of the underlying Markov chain. We further design a sampling-based algorithm for low-rank approximation of the shift matrix and quantify spectral recoverability—and the minimal required shift—via Markov chain functional inequalities. Results: Experiments confirm that our shift strategy substantially improves low-rank approximation accuracy and yields superior generalization and transfer performance in goal-conditioned RL.

Technology Category

Application Category

📝 Abstract
Low-rank structure is a common implicit assumption in many modern reinforcement learning (RL) algorithms. For instance, reward-free and goal-conditioned RL methods often presume that the successor measure admits a low-rank representation. In this work, we challenge this assumption by first remarking that the successor measure itself is not low-rank. Instead, we demonstrate that a low-rank structure naturally emerges in the shifted successor measure, which captures the system dynamics after bypassing a few initial transitions. We provide finite-sample performance guarantees for the entry-wise estimation of a low-rank approximation of the shifted successor measure from sampled entries. Our analysis reveals that both the approximation and estimation errors are primarily governed by the so-called spectral recoverability of the corresponding matrix. To bound this parameter, we derive a new class of functional inequalities for Markov chains that we call Type II Poincaré inequalities and from which we can quantify the amount of shift needed for effective low-rank approximation and estimation. This analysis shows in particular that the required shift depends on decay of the high-order singular values of the shifted successor measure and is hence typically small in practice. Additionally, we establish a connection between the necessary shift and the local mixing properties of the underlying dynamical system, which provides a natural way of selecting the shift. Finally, we validate our theoretical findings with experiments, and demonstrate that shifting the successor measure indeed leads to improved performance in goal-conditioned RL.
Problem

Research questions and friction points this paper is trying to address.

Challenges low-rank assumption in successor measure representations
Proves low-rank structure emerges in shifted successor measure
Establishes connection between shift amount and system mixing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Shifted successor measure enables low-rank representation
Type II Poincaré inequalities quantify required shift
Shift connects to local mixing properties of dynamics
🔎 Similar Papers
No similar papers found.
B
Bastien Dubail
KTH, Stockholm, Sweden
Stefan Stojanovic
Stefan Stojanovic
KTH Royal Institute of Technology, Sweden
Reinforcement LearningMachine Learning
A
Alexandre Proutière
KTH, Stockholm, Sweden