🤖 AI Summary
This work addresses the performance degradation of autonomous driving systems when deployed across cities due to domain shift, a challenge exacerbated by existing methods that rely on target-domain labels or task-specific adaptations, limiting their generalization. To overcome this, we propose CityGen, a diffusion-based generative framework that, for the first time, enables label-free cross-city style synthesis. CityGen leverages high-definition maps to impose structural constraints and incorporates city-level visual prompts to guide the generation of target-city-styled data. We introduce CityTransfer-Bench, the first geographically disjoint benchmark for cross-city generalization, and validate our approach across multiple tasks including perception, segmentation, and planning. Experiments demonstrate that CityGen significantly enhances model robustness across cities while offering strong scalability and label efficiency, establishing a new paradigm for generalizable autonomous driving systems.
📝 Abstract
Autonomous driving systems are commonly trained and evaluated within limited geographic regions, which hinders their scalability when deployed in new cities. However, significant domain shifts in appearance, road topology, and traffic patterns often cause severe performance degradation under cross-city deployment. Existing approaches based on domain adaptation, data augmentation, or synthetic data generation typically rely on labeled target data, city-specific annotations, or task-specific designs, limiting their scalability and effectiveness for holistic evaluation. In this paper, we introduce CityTransfer-Bench, a geographically disjoint benchmark for evaluating cross-city generalization across perception, segmentation, and planning, and propose CityGen, a diffusion-based generative framework that performs zero-label city adaptation via HD-map-conditioned synthesis guided by city-level visual prompts. Extensive experiments demonstrate that CityGen consistently improves cross-city robustness across multiple tasks, establishing a scalable and label-efficient foundation for generalizable autonomous driving.