InDRiVE: Intrinsic Disagreement based Reinforcement for Vehicle Exploration through Curiosity Driven Generalized World Model

📅 2025-03-07

📈 Citations: 0

✨ Influential: 0

career value

241K/year

🤖 AI Summary

Existing model-based reinforcement learning (MBRL) methods for autonomous driving suffer from poor generalization and heavy reliance on task-specific external rewards. To address this, we propose a task-agnostic intrinsic exploration framework: for the first time, we integrate intrinsic disagreement signals—derived from an ensemble of world models—into the Dreamer architecture, enabling uncertainty-driven active exploration and yielding task-invariant latent representations. Our approach eliminates external rewards entirely, instead constructing intrinsic rewards solely from prediction errors in latent dynamics, thereby achieving fully self-supervised policy optimization and environment modeling. Experiments demonstrate consistent superiority over DreamerV2 and DreamerV3 across both seen and unseen environments, with significant improvements in driving success rate, violation rate, and sample efficiency. These results validate that intrinsic exploration substantially enhances the robustness and transferability of MBRL in autonomous driving.

Technology Category

Application Category

📝 Abstract

Model-based Reinforcement Learning (MBRL) has emerged as a promising paradigm for autonomous driving, where data efficiency and robustness are critical. Yet, existing solutions often rely on carefully crafted, task specific extrinsic rewards, limiting generalization to new tasks or environments. In this paper, we propose InDRiVE (Intrinsic Disagreement based Reinforcement for Vehicle Exploration), a method that leverages purely intrinsic, disagreement based rewards within a Dreamer based MBRL framework. By training an ensemble of world models, the agent actively explores high uncertainty regions of environments without any task specific feedback. This approach yields a task agnostic latent representation, allowing for rapid zero shot or few shot fine tuning on downstream driving tasks such as lane following and collision avoidance. Experimental results in both seen and unseen environments demonstrate that InDRiVE achieves higher success rates and fewer infractions compared to DreamerV2 and DreamerV3 baselines despite using significantly fewer training steps. Our findings highlight the effectiveness of purely intrinsic exploration for learning robust vehicle control behaviors, paving the way for more scalable and adaptable autonomous driving systems.

Problem

Research questions and friction points this paper is trying to address.

Enhances autonomous driving with intrinsic, disagreement-based rewards.

Improves generalization to new tasks without task-specific feedback.

Reduces training steps while increasing success rates in driving tasks.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Intrinsic disagreement rewards in MBRL framework

Ensemble world models for uncertainty exploration

Task-agnostic latent representation for rapid fine-tuning

🔎 Similar Papers

A Review of Reward Functions for Reinforcement Learning in the context of Autonomous Driving