CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence

📅 2026-03-30
📈 Citations: 0
Influential: 0
📄 PDF

career value

226K/year
🤖 AI Summary
Existing open-source simulation platforms struggle to support embodied intelligence research involving air-ground collaboration due to the disjointed modeling of aerial and terrestrial agents. This work proposes a unified, high-fidelity simulation framework that integrates urban driving and multirotor drone flight within a single Unreal Engine process, achieving strict spatiotemporal consistency between air and ground agents through a shared physics and rendering pipeline for the first time. The framework maintains native compatibility with both CARLA and AirSim interfaces, enabling zero-modification code reuse, extensible custom robot integration, synchronized acquisition across 18 sensor modalities, and physically accurate drone dynamics. It also provides Python APIs and ROS 2 support. Experiments demonstrate its effectiveness in enabling air-ground collaborative tasks, embodied creative navigation, multimodal dataset generation, and reinforcement learning policy training, thereby significantly extending and upgrading the now-discontinued AirSim flight stack.

Technology Category

Application Category

📝 Abstract
The convergence of low-altitude economies, embodied intelligence, and air-ground cooperative systems creates growing demand for simulation infrastructure capable of jointly modeling aerial and ground agents within a single physically coherent environment. Existing open-source platforms remain domain-segregated: driving simulators lack aerial dynamics, while multirotor simulators lack realistic ground scenes. Bridge-based co-simulation introduces synchronization overhead and cannot guarantee strict spatial-temporal consistency. We present CARLA-Air, an open-source infrastructure that unifies high-fidelity urban driving and physics-accurate multirotor flight within a single Unreal Engine process. The platform preserves both CARLA and AirSim native Python APIs and ROS 2 interfaces, enabling zero-modification code reuse. Within a shared physics tick and rendering pipeline, CARLA-Air delivers photorealistic environments with rule-compliant traffic, socially-aware pedestrians, and aerodynamically consistent UAV dynamics, synchronously capturing up to 18 sensor modalities across all platforms at each tick. The platform supports representative air-ground embodied intelligence workloads spanning cooperation, embodied navigation and vision-language action, multi-modal perception and dataset construction, and reinforcement-learning-based policy training. An extensible asset pipeline allows integration of custom robot platforms into the shared world. By inheriting AirSim's aerial capabilities -- whose upstream development has been archived -- CARLA-Air ensures this widely adopted flight stack continues to evolve within a modern infrastructure. Released with prebuilt binaries and full source: https://github.com/louiszengCN/CarlaAir
Problem

Research questions and friction points this paper is trying to address.

air-ground simulation
embodied intelligence
multi-agent co-simulation
spatial-temporal consistency
unified simulation infrastructure
Innovation

Methods, ideas, or system contributions that make the work stand out.

air-ground simulation
embodied intelligence
unified physics engine
multi-modal sensing
CARLA-Air