CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence

📅 2026-03-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing open-source simulation platforms struggle to support embodied intelligence research involving air-ground collaboration due to the disjointed modeling of aerial and terrestrial agents. This work proposes a unified, high-fidelity simulation framework that integrates urban driving and multirotor drone flight within a single Unreal Engine process, achieving strict spatiotemporal consistency between air and ground agents through a shared physics and rendering pipeline for the first time. The framework maintains native compatibility with both CARLA and AirSim interfaces, enabling zero-modification code reuse, extensible custom robot integration, synchronized acquisition across 18 sensor modalities, and physically accurate drone dynamics. It also provides Python APIs and ROS 2 support. Experiments demonstrate its effectiveness in enabling air-ground collaborative tasks, embodied creative navigation, multimodal dataset generation, and reinforcement learning policy training, thereby significantly extending and upgrading the now-discontinued AirSim flight stack.
📝 Abstract
The convergence of low-altitude economies, embodied intelligence, and air-ground cooperative systems creates growing demand for simulation infrastructure capable of jointly modeling aerial and ground agents within a single physically coherent environment. Existing open-source platforms remain domain-segregated: driving simulators lack aerial dynamics, while multirotor simulators lack realistic ground scenes. Bridge-based co-simulation introduces synchronization overhead and cannot guarantee strict spatial-temporal consistency. We present CARLA-Air, an open-source infrastructure that unifies high-fidelity urban driving and physics-accurate multirotor flight within a single Unreal Engine process. The platform preserves both CARLA and AirSim native Python APIs and ROS 2 interfaces, enabling zero-modification code reuse. Within a shared physics tick and rendering pipeline, CARLA-Air delivers photorealistic environments with rule-compliant traffic, socially-aware pedestrians, and aerodynamically consistent UAV dynamics, synchronously capturing up to 18 sensor modalities across all platforms at each tick. The platform supports representative air-ground embodied intelligence workloads spanning cooperation, embodied navigation and vision-language action, multi-modal perception and dataset construction, and reinforcement-learning-based policy training. An extensible asset pipeline allows integration of custom robot platforms into the shared world. By inheriting AirSim's aerial capabilities -- whose upstream development has been archived -- CARLA-Air ensures this widely adopted flight stack continues to evolve within a modern infrastructure. Released with prebuilt binaries and full source: https://github.com/louiszengCN/CarlaAir
Problem

Research questions and friction points this paper is trying to address.

air-ground simulation
embodied intelligence
multi-agent co-simulation
spatial-temporal consistency
unified simulation infrastructure
Innovation

Methods, ideas, or system contributions that make the work stand out.

air-ground simulation
embodied intelligence
unified physics engine
multi-modal sensing
CARLA-Air
🔎 Similar Papers
No similar papers found.
T
Tianle Zeng
Shenzhen Key Laboratory of Robotics and Computer Vision, Southern University of Science and Technology
H
Hanxuan Chen
College of Electrical and Information Engineering, Hunan University
Y
Yanci Wen
Shenzhen Key Laboratory of Robotics and Computer Vision, Southern University of Science and Technology
Hong Zhang
Hong Zhang
School of Cybersecurity and Computer Science, Hebei University
Big DataEdge ComputingInformation SecurityAI