Context-Aware Model-Based Reinforcement Learning for Autonomous Racing

📅 2025-10-13

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

To address the limited generalization of model-based reinforcement learning (MBRL) in autonomous racing under varying opponent behaviors, this paper formulates head-to-head racing as a contextual Markov decision process (CMDP). We propose cMask, a novel approach that parameterizes opponent behavior as context, jointly models state transitions and reward functions conditioned on this context, and introduces a context-aware masking mechanism to enable dynamic model adaptation. Compared to existing contextual MBRL methods, cMask significantly improves generalization and robustness to out-of-distribution opponent behaviors. In the Roboracer simulation environment, it achieves superior racing performance and higher sample efficiency. Moreover, cMask also outperforms baseline methods even in in-distribution scenarios, demonstrating consistent gains across diverse opponent policies. The method thus advances contextual modeling in competitive multi-agent RL by enabling adaptive, context-sensitive dynamics and reward estimation without requiring explicit opponent identification or retraining.

Technology Category

Application Category

📝 Abstract

Autonomous vehicles have shown promising potential to be a groundbreaking technology for improving the safety of road users. For these vehicles, as well as many other safety-critical robotic technologies, to be deployed in real-world applications, we require algorithms that can generalize well to unseen scenarios and data. Model-based reinforcement learning algorithms (MBRL) have demonstrated state-of-the-art performance and data efficiency across a diverse set of domains. However, these algorithms have also shown susceptibility to changes in the environment and its transition dynamics. In this work, we explore the performance and generalization capabilities of MBRL algorithms for autonomous driving, specifically in the simulated autonomous racing environment, Roboracer (formerly F1Tenth). We frame the head-to-head racing task as a learning problem using contextual Markov decision processes and parameterize the driving behavior of the adversaries using the context of the episode, thereby also parameterizing the transition and reward dynamics. We benchmark the behavior of MBRL algorithms in this environment and propose a novel context-aware extension of the existing literature, cMask. We demonstrate that context-aware MBRL algorithms generalize better to out-of-distribution adversary behaviors relative to context-free approaches. We also demonstrate that cMask displays strong generalization capabilities, as well as further performance improvement relative to other context-aware MBRL approaches when racing against adversaries with in-distribution behaviors.

Problem

Research questions and friction points this paper is trying to address.

MBRL algorithms struggle with environmental dynamics changes

Autonomous racing requires generalization to unseen adversary behaviors

Context-free approaches underperform in out-of-distribution scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Context-aware MBRL for autonomous racing

cMask extension for generalization improvement

Parameterized transition dynamics using episode context

🔎 Similar Papers

No similar papers found.