Learning Game-Playing Agents with Generative Code Optimization

📅 2025-08-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of enabling game agents to perform long-horizon, complex reasoning and achieve low-human-intervention autonomous optimization. We propose a generative code optimization framework that explicitly represents policies as executable, evolvable Python programs. Leveraging large language models (LLMs), the agent iteratively refines its policy by jointly analyzing program execution traces and natural-language feedback. Our core contribution is replacing conventional neural network parameterizations with programmable policy representations, endowing agents with explicit logical reasoning, debugging, and evolutionary capabilities. Evaluated on the Atari benchmark, our approach achieves performance competitive with state-of-the-art deep reinforcement learning methods, while reducing training time by ~40% and environment interactions by ~60%. Crucially, it significantly diminishes reliance on handcrafted reward functions and extensive trial-and-error interaction. These results demonstrate the promise of programmatic policy representations for sample-efficient, interpretable, and human-aligned agent learning.

Technology Category

Application Category

📝 Abstract
We present a generative optimization approach for learning game-playing agents, where policies are represented as Python programs and refined using large language models (LLMs). Our method treats decision-making policies as self-evolving code, with current observation as input and an in-game action as output, enabling agents to self-improve through execution traces and natural language feedback with minimal human intervention. Applied to Atari games, our game-playing Python program achieves performance competitive with deep reinforcement learning (RL) baselines while using significantly less training time and much fewer environment interactions. This work highlights the promise of programmatic policy representations for building efficient, adaptable agents capable of complex, long-horizon reasoning.
Problem

Research questions and friction points this paper is trying to address.

Learning game-playing agents via generative code optimization
Refining Python program policies using large language models
Achieving competitive performance with fewer environment interactions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative code optimization for agent learning
Self-evolving Python policies using LLMs
Programmatic agents with minimal environment interaction
🔎 Similar Papers
No similar papers found.
Z
Zhiyi Kuang
Department of Computer Science, Stanford University
R
Ryan Rong
Department of Computer Science, Stanford University
Y
YuCheng Yuan
Department of Computer Science, Stanford University
Allen Nie
Allen Nie
Stanford University
Reinforcement LearningNatural Language ProcessingClinical Decision MakingEducation