A Pontryagin Perspective on Reinforcement Learning

📅 2024-05-28

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

🤖 AI Summary

Traditional reinforcement learning relies on state-dependent policies, struggling to simultaneously achieve theoretical interpretability and high-dimensional control performance. This work proposes a novel open-loop RL paradigm that abandons state feedback entirely and instead optimizes fixed-length action sequences directly, replacing step-wise decision-making with trajectory-level optimization. Methodologically, we establish a rigorous theoretical foundation grounded in Pontryagin’s Minimum Principle, circumventing the limitations of the Bellman equation. We design one robust model-based algorithm and two sample-efficient model-free algorithms, all accompanied by convergence guarantees. Empirical evaluation on the cart-pole swing-up task and two high-dimensional MuJoCo benchmarks demonstrates that our approach significantly outperforms mainstream baselines, validating its effectiveness, generalization capability, and theoretical consistency.

Technology Category

Application Category

📝 Abstract

Reinforcement learning has traditionally focused on learning state-dependent policies to solve optimal control problems in a closed-loop fashion. In this work, we introduce the paradigm of open-loop reinforcement learning where a fixed action sequence is learned instead. We present three new algorithms: one robust model-based method and two sample-efficient model-free methods. Rather than basing our algorithms on Bellman's equation from dynamic programming, our work builds on Pontryagin's principle from the theory of open-loop optimal control. We provide convergence guarantees and evaluate all methods empirically on a pendulum swing-up task, as well as on two high-dimensional MuJoCo tasks, significantly outperforming existing baselines.

Problem

Research questions and friction points this paper is trying to address.

Open-loop reinforcement learning for fixed action sequences

Pontryagin-based algorithms vs Bellman's dynamic programming

Empirical validation on pendulum and high-dimensional tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-loop reinforcement learning paradigm

Pontryagin's principle-based algorithms

Model-based and model-free methods

🔎 Similar Papers

No similar papers found.

Authors to Follow