Self-correcting Reward Shaping via Language Models for Reinforcement Learning Agents in Games

📅 2025-06-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Designing reward functions for reinforcement learning (RL) agents in games traditionally relies heavily on domain expertise and struggles to adapt to dynamic content changes. Method: This paper proposes an LLM-based automated iterative reward weight optimization method that takes user-specified behavioral objectives as input and leverages agent training feedback—such as success rate and episode length—to perform closed-loop, multi-round LLM reasoning for reward weight self-calibration, eliminating manual intervention. Contribution/Results: To our knowledge, this is the first work to integrate LLMs into online adaptive optimization of RL reward functions, substantially reducing dependence on human experts. Evaluated on a racing task, the approach improves agent success rate from 9% to 80% and reduces average lap steps to 855—performance approaching that achieved by expert manual tuning.

Technology Category

Application Category

📝 Abstract
Reinforcement Learning (RL) in games has gained significant momentum in recent years, enabling the creation of different agent behaviors that can transform a player's gaming experience. However, deploying RL agents in production environments presents two key challenges: (1) designing an effective reward function typically requires an RL expert, and (2) when a game's content or mechanics are modified, previously tuned reward weights may no longer be optimal. Towards the latter challenge, we propose an automated approach for iteratively fine-tuning an RL agent's reward function weights, based on a user-defined language based behavioral goal. A Language Model (LM) proposes updated weights at each iteration based on this target behavior and a summary of performance statistics from prior training rounds. This closed-loop process allows the LM to self-correct and refine its output over time, producing increasingly aligned behavior without the need for manual reward engineering. We evaluate our approach in a racing task and show that it consistently improves agent performance across iterations. The LM-guided agents show a significant increase in performance from $9%$ to $74%$ success rate in just one iteration. We compare our LM-guided tuning against a human expert's manual weight design in the racing task: by the final iteration, the LM-tuned agent achieved an $80%$ success rate, and completed laps in an average of $855$ time steps, a competitive performance against the expert-tuned agent's peak $94%$ success, and $850$ time steps.
Problem

Research questions and friction points this paper is trying to address.

Automates reward function tuning for RL agents in games
Adapts reward weights to game content changes automatically
Uses language models to align behavior with goals
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated reward function fine-tuning via LM
LM self-corrects weights using performance stats
Language-based behavioral goal guides RL tuning
🔎 Similar Papers
No similar papers found.
António Afonso
António Afonso
SEED - Electronic Arts (EA), Stockholm, Sweden
Iolanda Leite
Iolanda Leite
Associate Professor at KTH Royal Institute of Technology
Human-Robot InteractionArtificial IntelligenceSocial RoboticsMultimodal Interaction
A
Alessandro Sestini
SEED - Electronic Arts (EA), Stockholm, Sweden
F
Florian Fuchs
SEED - Electronic Arts (EA), Stockholm, Sweden
K
Konrad Tollmar
SEED - Electronic Arts (EA), Stockholm, Sweden
L
Linus Gisslén
SEED - Electronic Arts (EA), Stockholm, Sweden