Programmatic Reinforcement Learning: Navigating Gridworlds

📅 2024-02-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the optimality and learning complexity of programmable policies in stochastic environments, focusing on Programmable Reinforcement Learning (PRL) in grid-world domains. Method: We formally define a class of programmable policies supporting higher-order constructs—such as loops—and derive a tight upper bound on the minimal program size required for optimal policies. We propose the first theoretically grounded and practically viable program synthesis algorithm for PRL, achieving rigorous integration of reinforcement learning and program synthesis, and prove its convergence. Our approach combines formal methods, program semantics modeling, and theoretical analysis. Results: We characterize the fundamental optimality and compactness properties of programmable policies, and empirically validate our algorithm on canonical maze environments. It successfully synthesizes correct, compact, and generalizable loop-based policies. This work establishes the first systematic theoretical foundation for PRL.

Technology Category

Application Category

📝 Abstract
The field of reinforcement learning (RL) is concerned with algorithms for learning optimal policies in unknown stochastic environments. Programmatic RL studies representations of policies as programs, meaning involving higher order constructs such as control loops. Despite attracting a lot of attention at the intersection of the machine learning and formal methods communities, very little is known on the theoretical front about programmatic RL: what are good classes of programmatic policies? How large are optimal programmatic policies? How can we learn them? The goal of this paper is to give first answers to these questions, initiating a theoretical study of programmatic RL. Considering a class of gridworld environments, we define a class of programmatic policies. Our main contributions are to place upper bounds on the size of optimal programmatic policies, and to construct an algorithm for synthesizing them. These theoretical findings are complemented by a prototype implementation of the algorithm.
Problem

Research questions and friction points this paper is trying to address.

Reinforcement Learning
Maze Environment
Optimal Policy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Programmatic Reinforcement Learning
Optimal Policy Complexity
Algorithm for Policy Generation
G
Guruprerana Shabadi
University of Pennsylvania, United States, University of Warsaw, Poland
Nathanaël Fijalkow
Nathanaël Fijalkow
CNRS, LaBRI, Bordeaux
GamesProgram Synthesis
T
Théo Matricon
CNRS, LaBRI, Université of Bordeaux, France