InnateCoder: Learning Programmatic Options with Foundation Models

📅 2025-05-18

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Reinforcement learning (RL) agents in non-transfer settings must explore environments from scratch, resulting in extremely low sample efficiency for acquiring foundational skills. Method: We propose InnateCoder, the first framework to zero-shot distill human prior knowledge from large language models (LLMs) into executable, composable, temporally extended actions—termed “Options”—which serve as programmable “innate skills” embedded within an options-based RL architecture. InnateCoder bypasses conventional environment-driven option discovery by directly encoding structured priors as modular policy components, integrating program synthesis with policy search to construct complex behaviors. Results: Evaluated on MicroRTS and Karel simulation benchmarks, InnateCoder achieves substantial gains in sample efficiency over both flat (option-free) baselines and empirical option-learning approaches. It establishes a novel paradigm for improving RL sample efficiency through LLM-grounded, prior-informed skill acquisition.

Technology Category

Application Category

📝 Abstract

Outside of transfer learning settings, reinforcement learning agents start their learning process from a clean slate. As a result, such agents have to go through a slow process to learn even the most obvious skills required to solve a problem. In this paper, we present InnateCoder, a system that leverages human knowledge encoded in foundation models to provide programmatic policies that encode"innate skills"in the form of temporally extended actions, or options. In contrast to existing approaches to learning options, InnateCoder learns them from the general human knowledge encoded in foundation models in a zero-shot setting, and not from the knowledge the agent gains by interacting with the environment. Then, InnateCoder searches for a programmatic policy by combining the programs encoding these options into larger and more complex programs. We hypothesized that InnateCoder's way of learning and using options could improve the sampling efficiency of current methods for learning programmatic policies. Empirical results in MicroRTS and Karel the Robot support our hypothesis, since they show that InnateCoder is more sample efficient than versions of the system that do not use options or learn them from experience.

Problem

Research questions and friction points this paper is trying to address.

Leveraging foundation models to encode innate skills as programmatic options

Learning programmatic policies without environment interaction for efficiency

Improving sampling efficiency in reinforcement learning with pre-learned options

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages foundation models for programmatic policies

Learns options from human knowledge zero-shot

Combines options into complex programs efficiently

🔎 Similar Papers

CompilerDream: Learning a Compiler World Model for General Code Optimization

2024-04-24Citations: 0

Bosch Group

Renningen, BW, DE

Master Thesis Combining Imitation & Reinforcement Learning to Solve Automated Driving

Bosch Group

Renningen, BW, DE

Authors to Follow