Maximum Total Correlation Reinforcement Learning

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This paper addresses the limited generalization and robustness of policies in reinforcement learning by introducing a novel paradigm that optimizes for *trajectory-level behavioral simplicity*. The core methodological innovation is the first use of *maximizing total trajectory correlation* as an episode-level regularizer, which encourages policies to produce periodic, compressible, and intrinsically simple action sequences. To realize this objective, we propose an end-to-end differentiable algorithm based on a variational lower bound, jointly optimizing the policy network and state representation while incorporating a trajectory-level information bottleneck. Evaluated on simulated robotic control tasks, the approach significantly improves policy robustness against observation noise and dynamics perturbations, without sacrificing—and often improving—task performance. Crucially, it naturally induces structured, low-complexity behavioral patterns, demonstrating emergent behavioral regularization through correlation-aware trajectory optimization.

Technology Category

Application Category

📝 Abstract

Simplicity is a powerful inductive bias. In reinforcement learning, regularization is used for simpler policies, data augmentation for simpler representations, and sparse reward functions for simpler objectives, all that, with the underlying motivation to increase generalizability and robustness by focusing on the essentials. Supplementary to these techniques, we investigate how to promote simple behavior throughout the episode. To that end, we introduce a modification of the reinforcement learning problem that additionally maximizes the total correlation within the induced trajectories. We propose a practical algorithm that optimizes all models, including policy and state representation, based on a lower-bound approximation. In simulated robot environments, our method naturally generates policies that induce periodic and compressible trajectories, and that exhibit superior robustness to noise and changes in dynamics compared to baseline methods, while also improving performance in the original tasks.

Problem

Research questions and friction points this paper is trying to address.

Promote simple behavior in reinforcement learning episodes

Maximize total correlation within induced trajectories

Improve robustness and performance in dynamic environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Maximizes total correlation within trajectories

Optimizes policy and state representation jointly

Generates periodic and compressible trajectories

🔎 Similar Papers

No similar papers found.