Maximum Total Correlation Reinforcement Learning

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the limited generalization and robustness of policies in reinforcement learning by introducing a novel paradigm that optimizes for *trajectory-level behavioral simplicity*. The core methodological innovation is the first use of *maximizing total trajectory correlation* as an episode-level regularizer, which encourages policies to produce periodic, compressible, and intrinsically simple action sequences. To realize this objective, we propose an end-to-end differentiable algorithm based on a variational lower bound, jointly optimizing the policy network and state representation while incorporating a trajectory-level information bottleneck. Evaluated on simulated robotic control tasks, the approach significantly improves policy robustness against observation noise and dynamics perturbations, without sacrificing—and often improving—task performance. Crucially, it naturally induces structured, low-complexity behavioral patterns, demonstrating emergent behavioral regularization through correlation-aware trajectory optimization.

Technology Category

Application Category

📝 Abstract
Simplicity is a powerful inductive bias. In reinforcement learning, regularization is used for simpler policies, data augmentation for simpler representations, and sparse reward functions for simpler objectives, all that, with the underlying motivation to increase generalizability and robustness by focusing on the essentials. Supplementary to these techniques, we investigate how to promote simple behavior throughout the episode. To that end, we introduce a modification of the reinforcement learning problem that additionally maximizes the total correlation within the induced trajectories. We propose a practical algorithm that optimizes all models, including policy and state representation, based on a lower-bound approximation. In simulated robot environments, our method naturally generates policies that induce periodic and compressible trajectories, and that exhibit superior robustness to noise and changes in dynamics compared to baseline methods, while also improving performance in the original tasks.
Problem

Research questions and friction points this paper is trying to address.

Promote simple behavior in reinforcement learning episodes
Maximize total correlation within induced trajectories
Improve robustness and performance in dynamic environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Maximizes total correlation within trajectories
Optimizes policy and state representation jointly
Generates periodic and compressible trajectories
🔎 Similar Papers
No similar papers found.
B
Bang You
Department of Computer Science, Tsinghua University, Beijing, China
Puze Liu
Puze Liu
German Research Center for AI (DFKI)
RoboticsReinforcement Lerning,Robot Learning
Huaping Liu
Huaping Liu
Professor of Electrical Engineering, Oregon State University
Communication theorywireless communicationssignal processingsensor networksinformation security
J
Jan Peters
Intelligent Autonomous Systems Lab, Technische Universität Darmstadt, Darmstadt, Germany; Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Germany; Hessian Centre for Artificial Intelligence (Hessian.AI); Centre for Cognitive Science (CogSci)
Oleg Arenz
Oleg Arenz
Postdoctoral Researcher, Technische Universitaet Darmstadt
Autonomous RobotsInverse Reinforcement LearningVariational InferenceReinforcement Learning