Decoupling Scene Perception and Ego Status: A Multi-Context Fusion Approach for Enhanced Generalization in End-to-End Autonomous Driving

📅 2025-11-17

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

Existing end-to-end autonomous driving systems prematurely fuse ego-state information during BEV encoding, causing scene perception to be overly dominated by strong priors and thereby limiting generalization and robustness. To address this, we propose AdaptiveAD: a dual-branch architecture that decouples scene-driven and ego-driven decision-making, coupled with an adaptive fusion module that delays ego-state injection. We further introduce a path attention mechanism to explicitly model trajectory–scene interactions, and enhance multi-task coordination via BEV unidirectional distillation and autoregressive online mapping. Evaluated on the nuScenes open-loop planning benchmark, AdaptiveAD achieves state-of-the-art performance while significantly reducing reliance on ego-state inputs. It demonstrates superior generalization in challenging scenarios—including complex intersections, occlusions, and long-tail distributions—without compromising planning accuracy or real-time feasibility.

Technology Category

Application Category

📝 Abstract

Modular design of planning-oriented autonomous driving has markedly advanced end-to-end systems. However, existing architectures remain constrained by an over-reliance on ego status, hindering generalization and robust scene understanding. We identify the root cause as an inherent design within these architectures that allows ego status to be easily leveraged as a shortcut. Specifically, the premature fusion of ego status in the upstream BEV encoder allows an information flow from this strong prior to dominate the downstream planning module. To address this challenge, we propose AdaptiveAD, an architectural-level solution based on a multi-context fusion strategy. Its core is a dual-branch structure that explicitly decouples scene perception and ego status. One branch performs scene-driven reasoning based on multi-task learning, but with ego status deliberately omitted from the BEV encoder, while the other conducts ego-driven reasoning based solely on the planning task. A scene-aware fusion module then adaptively integrates the complementary decisions from the two branches to form the final planning trajectory. To ensure this decoupling does not compromise multi-task learning, we introduce a path attention mechanism for ego-BEV interaction and add two targeted auxiliary tasks: BEV unidirectional distillation and autoregressive online mapping. Extensive evaluations on the nuScenes dataset demonstrate that AdaptiveAD achieves state-of-the-art open-loop planning performance. Crucially, it significantly mitigates the over-reliance on ego status and exhibits impressive generalization capabilities across diverse scenarios.

Problem

Research questions and friction points this paper is trying to address.

Decouples scene perception from ego status to prevent over-reliance on vehicle state

Addresses premature fusion of ego status that dominates planning decisions

Enhances generalization and robust scene understanding in autonomous driving

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decouples scene perception and ego status

Uses dual-branch multi-context fusion strategy

Introduces path attention and auxiliary tasks

🔎 Similar Papers

No similar papers found.