🤖 AI Summary
This paper addresses the limited generalization and robustness of policies in reinforcement learning by introducing a novel paradigm that optimizes for *trajectory-level behavioral simplicity*. The core methodological innovation is the first use of *maximizing total trajectory correlation* as an episode-level regularizer, which encourages policies to produce periodic, compressible, and intrinsically simple action sequences. To realize this objective, we propose an end-to-end differentiable algorithm based on a variational lower bound, jointly optimizing the policy network and state representation while incorporating a trajectory-level information bottleneck. Evaluated on simulated robotic control tasks, the approach significantly improves policy robustness against observation noise and dynamics perturbations, without sacrificing—and often improving—task performance. Crucially, it naturally induces structured, low-complexity behavioral patterns, demonstrating emergent behavioral regularization through correlation-aware trajectory optimization.
📝 Abstract
Simplicity is a powerful inductive bias. In reinforcement learning, regularization is used for simpler policies, data augmentation for simpler representations, and sparse reward functions for simpler objectives, all that, with the underlying motivation to increase generalizability and robustness by focusing on the essentials. Supplementary to these techniques, we investigate how to promote simple behavior throughout the episode. To that end, we introduce a modification of the reinforcement learning problem that additionally maximizes the total correlation within the induced trajectories. We propose a practical algorithm that optimizes all models, including policy and state representation, based on a lower-bound approximation. In simulated robot environments, our method naturally generates policies that induce periodic and compressible trajectories, and that exhibit superior robustness to noise and changes in dynamics compared to baseline methods, while also improving performance in the original tasks.