Learning Game-Playing Agents with Generative Code Optimization

📅 2025-08-26

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work addresses the challenge of enabling game agents to perform long-horizon, complex reasoning and achieve low-human-intervention autonomous optimization. We propose a generative code optimization framework that explicitly represents policies as executable, evolvable Python programs. Leveraging large language models (LLMs), the agent iteratively refines its policy by jointly analyzing program execution traces and natural-language feedback. Our core contribution is replacing conventional neural network parameterizations with programmable policy representations, endowing agents with explicit logical reasoning, debugging, and evolutionary capabilities. Evaluated on the Atari benchmark, our approach achieves performance competitive with state-of-the-art deep reinforcement learning methods, while reducing training time by ~40% and environment interactions by ~60%. Crucially, it significantly diminishes reliance on handcrafted reward functions and extensive trial-and-error interaction. These results demonstrate the promise of programmatic policy representations for sample-efficient, interpretable, and human-aligned agent learning.

Technology Category

Application Category

📝 Abstract

We present a generative optimization approach for learning game-playing agents, where policies are represented as Python programs and refined using large language models (LLMs). Our method treats decision-making policies as self-evolving code, with current observation as input and an in-game action as output, enabling agents to self-improve through execution traces and natural language feedback with minimal human intervention. Applied to Atari games, our game-playing Python program achieves performance competitive with deep reinforcement learning (RL) baselines while using significantly less training time and much fewer environment interactions. This work highlights the promise of programmatic policy representations for building efficient, adaptable agents capable of complex, long-horizon reasoning.

Problem

Research questions and friction points this paper is trying to address.

Learning game-playing agents via generative code optimization

Refining Python program policies using large language models

Achieving competitive performance with fewer environment interactions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative code optimization for agent learning

Self-evolving Python policies using LLMs

Programmatic agents with minimal environment interaction

🔎 Similar Papers

No similar papers found.