InDRiVE: Reward-Free World-Model Pretraining for Autonomous Driving via Latent Disagreement

📅 2025-12-21

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

To address the reliance of autonomous driving world model pretraining on handcrafted external rewards and its limited generalization capability, this paper proposes a reward-free latent-space ensemble disagreement-driven pretraining framework. Methodologically, it leverages predictive disagreement among an ensemble of latent dynamics models as an intrinsic uncertainty signal—replacing task-specific rewards—to enable policy-free autonomous exploration in CARLA. Built upon the DreamerV3 architecture, the framework integrates latent-space modeling, disagreement estimation, imagination-based Actor-Critic planning, and online exploration optimization. Its core contribution is the first explicit formulation of model disagreement as an intrinsic reward reflecting epistemic uncertainty, enabling task-agnostic world model reuse. Experiments demonstrate strong zero-shot robustness across diverse towns, routes, and traffic densities; under equal interaction budgets, few-shot collision avoidance success improves by 27%.

Technology Category

Application Category

📝 Abstract

Model-based reinforcement learning (MBRL) can reduce interaction cost for autonomous driving by learning a predictive world model, but it typically still depends on task-specific rewards that are difficult to design and often brittle under distribution shift. This paper presents InDRiVE, a DreamerV3-style MBRL agent that performs reward-free pretraining in CARLA using only intrinsic motivation derived from latent ensemble disagreement. Disagreement acts as a proxy for epistemic uncertainty and drives the agent toward under-explored driving situations, while an imagination-based actor-critic learns a planner-free exploration policy directly from the learned world model. After intrinsic pretraining, we evaluate zero-shot transfer by freezing all parameters and deploying the pretrained exploration policy in unseen towns and routes. We then study few-shot adaptation by training a task policy with limited extrinsic feedback for downstream objectives (lane following and collision avoidance). Experiments in CARLA across towns, routes, and traffic densities show that disagreement-based pretraining yields stronger zero-shot robustness and robust few-shot collision avoidance under town shift and matched interaction budgets, supporting the use of intrinsic disagreement as a practical reward-free pretraining signal for reusable driving world models.

Problem

Research questions and friction points this paper is trying to address.

Develops reward-free pretraining for autonomous driving world models

Uses latent disagreement to drive exploration without task-specific rewards

Evaluates zero-shot transfer and few-shot adaptation in unseen environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reward-free pretraining using latent ensemble disagreement

Imagination-based actor-critic for planner-free exploration

Zero-shot transfer and few-shot adaptation with frozen parameters

🔎 Similar Papers

Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models