Self-correcting Reward Shaping via Language Models for Reinforcement Learning Agents in Games

📅 2025-06-30

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

Designing reward functions for reinforcement learning (RL) agents in games traditionally relies heavily on domain expertise and struggles to adapt to dynamic content changes. Method: This paper proposes an LLM-based automated iterative reward weight optimization method that takes user-specified behavioral objectives as input and leverages agent training feedback—such as success rate and episode length—to perform closed-loop, multi-round LLM reasoning for reward weight self-calibration, eliminating manual intervention. Contribution/Results: To our knowledge, this is the first work to integrate LLMs into online adaptive optimization of RL reward functions, substantially reducing dependence on human experts. Evaluated on a racing task, the approach improves agent success rate from 9% to 80% and reduces average lap steps to 855—performance approaching that achieved by expert manual tuning.

Technology Category

Application Category

📝 Abstract

Reinforcement Learning (RL) in games has gained significant momentum in recent years, enabling the creation of different agent behaviors that can transform a player's gaming experience. However, deploying RL agents in production environments presents two key challenges: (1) designing an effective reward function typically requires an RL expert, and (2) when a game's content or mechanics are modified, previously tuned reward weights may no longer be optimal. Towards the latter challenge, we propose an automated approach for iteratively fine-tuning an RL agent's reward function weights, based on a user-defined language based behavioral goal. A Language Model (LM) proposes updated weights at each iteration based on this target behavior and a summary of performance statistics from prior training rounds. This closed-loop process allows the LM to self-correct and refine its output over time, producing increasingly aligned behavior without the need for manual reward engineering. We evaluate our approach in a racing task and show that it consistently improves agent performance across iterations. The LM-guided agents show a significant increase in performance from $9%$ to $74%$ success rate in just one iteration. We compare our LM-guided tuning against a human expert's manual weight design in the racing task: by the final iteration, the LM-tuned agent achieved an $80%$ success rate, and completed laps in an average of $855$ time steps, a competitive performance against the expert-tuned agent's peak $94%$ success, and $850$ time steps.

Problem

Research questions and friction points this paper is trying to address.

Automates reward function tuning for RL agents in games

Adapts reward weights to game content changes automatically

Uses language models to align behavior with goals

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated reward function fine-tuning via LM

LM self-corrects weights using performance stats

Language-based behavioral goal guides RL tuning

🔎 Similar Papers

Self-playing Adversarial Language Game Enhances LLM Reasoning