Automated Reward Design for Gran Turismo

📅 2025-11-03

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

This work addresses the challenge of manually designing reward functions for reinforcement learning in complex, high-fidelity environments—exemplified by *Gran Turismo 7*—where hand-crafted rewards are brittle and labor-intensive. We propose the first end-to-end, text-driven automatic reward design framework. Methodologically, it uniquely integrates (i) large language models (LLMs) to generate executable, differentiable reward functions from natural language instructions (e.g., “aggressive overtaking” or “energy-efficient cornering”), (ii) vision-language models (VLMs) to perform fine-grained preference assessment of driving behaviors from visual observations, and (iii) iterative human-in-the-loop feedback to refine reward modeling—without manual hyperparameter tuning. The resulting policies exhibit strong performance: trained agents approach the skill level of the champion-level GT Sophy agent. Experiments demonstrate the framework’s effectiveness and generalization capability in real-world, high-dimensional continuous control tasks.

Technology Category

Application Category

📝 Abstract

When designing reinforcement learning (RL) agents, a designer communicates the desired agent behavior through the definition of reward functions - numerical feedback given to the agent as reward or punishment for its actions. However, mapping desired behaviors to reward functions can be a difficult process, especially in complex environments such as autonomous racing. In this paper, we demonstrate how current foundation models can effectively search over a space of reward functions to produce desirable RL agents for the Gran Turismo 7 racing game, given only text-based instructions. Through a combination of LLM-based reward generation, VLM preference-based evaluation, and human feedback we demonstrate how our system can be used to produce racing agents competitive with GT Sophy, a champion-level RL racing agent, as well as generate novel behaviors, paving the way for practical automated reward design in real world applications.

Problem

Research questions and friction points this paper is trying to address.

Automating reward function design for reinforcement learning agents

Mapping text instructions to reward functions in complex environments

Generating competitive racing agents through automated reward search

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based reward generation from text instructions

VLM preference-based evaluation of agent behaviors

Human feedback integration for automated reward design

🔎 Similar Papers

A Review of Reward Functions for Reinforcement Learning in the context of Autonomous Driving

2024-04-122024 IEEE Intelligent Vehicles Symposium (IV)Citations: 8

REvolve: Reward Evolution with Large Language Models using Human Feedback

2024-06-03Citations: 1

Bosch Group

Renningen, BW, DE

AI Research Scientist, Reinforcement Learning