Chebyshev Policies and the Mountain Car Problem: Reinforcement Learning for Low-Dimensional Control Tasks

📅 2026-05-21

📈 Citations: 0

✨ Influential: 0

career value

247K/year

🤖 AI Summary

This work addresses the long-standing absence of an analytical optimal solution for the classic reinforcement learning benchmark task Mountain Car by deriving, for the first time, its closed-form optimal control policy. Building upon this solution, the authors propose the Chebyshev policy—a lightweight, general-purpose policy class constructed from first principles via Chebyshev polynomial expansions. This policy exhibits high parameter efficiency, strong interpretability, and real-time inference capabilities. On the Mountain Car task, it achieves a 4.18× reduction in regret and uses 277× fewer parameters compared to neural network-based policies. Furthermore, it consistently outperforms existing methods across multiple RL benchmarks and real-world nonlinear motor control platforms.

📝 Abstract

We analytically solve the Mountain Car problem, a canonical benchmark in RL, and derive an optimal control solution, closing a gap after 36 years. This enables us to reveal two surprising insights: The optimal control is quite simple, yet modern RL agents display a large gap to optimality. Motivated by the analysis of the optimal control, we introduce Chebyshev policies as a universal (i.e. dense) class of RL policies from first principles. They can be trained as drop-in replacements of neural nets, reducing the regret by a factor of 4.18, while requiring 277 times fewer parameters, fostering sample efficiency, explainability and realtime capability. Chebyshev policies are evaluated on further RL tasks, including a real-world nonlinear motion control testbed. They consistently improve performance over neural nets with PPO, ARS and REINFORCE. Our results demonstrate how Chebyshev policies offer a compelling and lightweight alternative or addition to neural nets for low-dimensional control tasks.

Problem

Research questions and friction points this paper is trying to address.

Mountain Car

optimal control

reinforcement learning

low-dimensional control

policy gap

Innovation

Methods, ideas, or system contributions that make the work stand out.

Chebyshev policies

optimal control

sample efficiency