🤖 AI Summary
This paper addresses the challenge of jointly optimizing multiple objectives—reward maximization, safe exploration, and intrinsic motivation—in reinforcement learning. Methodologically, it introduces the first unified geometric optimization framework that generalizes classical algorithms such as policy mirror descent and natural policy gradient to settings involving nonlinear utility functions and convex constraints, integrating differential geometry and convex optimization to construct a trust-region-style nonlinear policy optimization framework for deep RL. Theoretically, it uncovers a shared geometric structure of multi-objective trade-offs in the space of long-horizon behavioral trajectories. Algorithmically, it unifies the modeling of robustness, safety, and exploratory diversity within a single principled formulation. This framework establishes a novel theoretical foundation for safe reinforcement learning and efficient exploration, while providing a scalable and modular paradigm for algorithm design.
📝 Abstract
Reward maximization, safe exploration, and intrinsic motivation are often studied as separate objectives in reinforcement learning (RL). We present a unified geometric framework, that views these goals as instances of a single optimization problem on the space of achievable long-term behavior in an environment. Within this framework, classical methods such as policy mirror descent, natural policy gradient, and trust-region algorithms naturally generalize to nonlinear utilities and convex constraints. We illustrate how this perspective captures robustness, safety, exploration, and diversity objectives, and outline open challenges at the interface of geometry and deep RL.