Automated Reward Design for Gran Turismo

📅 2025-11-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of manually designing reward functions for reinforcement learning in complex, high-fidelity environments—exemplified by *Gran Turismo 7*—where hand-crafted rewards are brittle and labor-intensive. We propose the first end-to-end, text-driven automatic reward design framework. Methodologically, it uniquely integrates (i) large language models (LLMs) to generate executable, differentiable reward functions from natural language instructions (e.g., “aggressive overtaking” or “energy-efficient cornering”), (ii) vision-language models (VLMs) to perform fine-grained preference assessment of driving behaviors from visual observations, and (iii) iterative human-in-the-loop feedback to refine reward modeling—without manual hyperparameter tuning. The resulting policies exhibit strong performance: trained agents approach the skill level of the champion-level GT Sophy agent. Experiments demonstrate the framework’s effectiveness and generalization capability in real-world, high-dimensional continuous control tasks.

Technology Category

Application Category

📝 Abstract
When designing reinforcement learning (RL) agents, a designer communicates the desired agent behavior through the definition of reward functions - numerical feedback given to the agent as reward or punishment for its actions. However, mapping desired behaviors to reward functions can be a difficult process, especially in complex environments such as autonomous racing. In this paper, we demonstrate how current foundation models can effectively search over a space of reward functions to produce desirable RL agents for the Gran Turismo 7 racing game, given only text-based instructions. Through a combination of LLM-based reward generation, VLM preference-based evaluation, and human feedback we demonstrate how our system can be used to produce racing agents competitive with GT Sophy, a champion-level RL racing agent, as well as generate novel behaviors, paving the way for practical automated reward design in real world applications.
Problem

Research questions and friction points this paper is trying to address.

Automating reward function design for reinforcement learning agents
Mapping text instructions to reward functions in complex environments
Generating competitive racing agents through automated reward search
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based reward generation from text instructions
VLM preference-based evaluation of agent behaviors
Human feedback integration for automated reward design
🔎 Similar Papers
No similar papers found.
Michel Ma
Michel Ma
Mila, University of Montreal
Takuma Seno
Takuma Seno
Turing Inc.
Deep reinforcement learningDeep learning
K
Kaushik Subramanian
Sony AI
P
Peter R. Wurman
Sony AI
P
Peter Stone
Sony AI, UT Austin
Craig Sherstan
Craig Sherstan
Sony AI