🤖 AI Summary
Existing open-source simulation platforms struggle to support embodied intelligence research involving air-ground collaboration due to the disjointed modeling of aerial and terrestrial agents. This work proposes a unified, high-fidelity simulation framework that integrates urban driving and multirotor drone flight within a single Unreal Engine process, achieving strict spatiotemporal consistency between air and ground agents through a shared physics and rendering pipeline for the first time. The framework maintains native compatibility with both CARLA and AirSim interfaces, enabling zero-modification code reuse, extensible custom robot integration, synchronized acquisition across 18 sensor modalities, and physically accurate drone dynamics. It also provides Python APIs and ROS 2 support. Experiments demonstrate its effectiveness in enabling air-ground collaborative tasks, embodied creative navigation, multimodal dataset generation, and reinforcement learning policy training, thereby significantly extending and upgrading the now-discontinued AirSim flight stack.
📝 Abstract
The convergence of low-altitude economies, embodied intelligence, and air-ground cooperative systems creates growing demand for simulation infrastructure capable of jointly modeling aerial and ground agents within a single physically coherent environment. Existing open-source platforms remain domain-segregated: driving simulators lack aerial dynamics, while multirotor simulators lack realistic ground scenes. Bridge-based co-simulation introduces synchronization overhead and cannot guarantee strict spatial-temporal consistency.
We present CARLA-Air, an open-source infrastructure that unifies high-fidelity urban driving and physics-accurate multirotor flight within a single Unreal Engine process. The platform preserves both CARLA and AirSim native Python APIs and ROS 2 interfaces, enabling zero-modification code reuse. Within a shared physics tick and rendering pipeline, CARLA-Air delivers photorealistic environments with rule-compliant traffic, socially-aware pedestrians, and aerodynamically consistent UAV dynamics, synchronously capturing up to 18 sensor modalities across all platforms at each tick. The platform supports representative air-ground embodied intelligence workloads spanning cooperation, embodied navigation and vision-language action, multi-modal perception and dataset construction, and reinforcement-learning-based policy training. An extensible asset pipeline allows integration of custom robot platforms into the shared world. By inheriting AirSim's aerial capabilities -- whose upstream development has been archived -- CARLA-Air ensures this widely adopted flight stack continues to evolve within a modern infrastructure.
Released with prebuilt binaries and full source: https://github.com/louiszengCN/CarlaAir