🤖 AI Summary
This work addresses the challenge of manually designing reward functions for reinforcement learning in complex, high-fidelity environments—exemplified by *Gran Turismo 7*—where hand-crafted rewards are brittle and labor-intensive. We propose the first end-to-end, text-driven automatic reward design framework. Methodologically, it uniquely integrates (i) large language models (LLMs) to generate executable, differentiable reward functions from natural language instructions (e.g., “aggressive overtaking” or “energy-efficient cornering”), (ii) vision-language models (VLMs) to perform fine-grained preference assessment of driving behaviors from visual observations, and (iii) iterative human-in-the-loop feedback to refine reward modeling—without manual hyperparameter tuning. The resulting policies exhibit strong performance: trained agents approach the skill level of the champion-level GT Sophy agent. Experiments demonstrate the framework’s effectiveness and generalization capability in real-world, high-dimensional continuous control tasks.
📝 Abstract
When designing reinforcement learning (RL) agents, a designer communicates the desired agent behavior through the definition of reward functions - numerical feedback given to the agent as reward or punishment for its actions. However, mapping desired behaviors to reward functions can be a difficult process, especially in complex environments such as autonomous racing. In this paper, we demonstrate how current foundation models can effectively search over a space of reward functions to produce desirable RL agents for the Gran Turismo 7 racing game, given only text-based instructions. Through a combination of LLM-based reward generation, VLM preference-based evaluation, and human feedback we demonstrate how our system can be used to produce racing agents competitive with GT Sophy, a champion-level RL racing agent, as well as generate novel behaviors, paving the way for practical automated reward design in real world applications.