🤖 AI Summary
This work addresses the lack of realistic, multimodal, cross-view datasets with overlapping observations for cooperative perception in unstructured environments. To bridge this gap, the authors present a novel dataset collected using heterogeneous aerial-ground robotic platforms—specifically, a Clearpath Husky UGV and an Autel EVO II UAV—under early-spring conditions with sparse tree canopies. The dataset comprises over 13,000 synchronized multimodal frames, including LiDAR, stereo vision, RGB, thermal infrared, IMU, and GPS data, captured across four distinct unstructured scenarios. It further includes more than 8,000 manually annotated images and integrates SAM3 for zero-shot segmentation. Designed to support cross-view fusion, traversability estimation, and collaborative scene understanding, this dataset fills a critical void in real-world aerial-ground cooperative perception under partial occlusion.
📝 Abstract
Heterogeneous air-ground robot teams combine complementary sensing modalities, mobility characteristics, and spatial viewpoints that can significantly enhance perception in complex outdoor environments. However, progress in multi-robot collaborative perception has been constrained by the lack of real-world datasets featuring overlapping multi-modal observations from platforms operating in unstructured terrain. We present GA3T (Ground-Aerial Team for Terrain Traversal), a real-world multi-robot collaborative perception dataset collected using a Clearpath Husky UGV and an Autel EVO~II UAV across diverse unstructured environments, including forest trails, rocky paths, muddy terrain, snow piles, and grass-covered fields. The ground platform provides 3D LiDAR, stereo camera, IMU, and GPS data, while the aerial platform contributes RGB imagery, thermal/infrared observations, and GPS from a complementary overhead viewpoint, allowing for rich cross-modal and cross-view perception. The dataset is collected in 4 unique environments, with over 13,000 synchronized frames across approximately 29 minutes of operation, and includes both SAM~3-based zero-shot segmentation and over 8,000 manually labeled images. A unique aspect of the dataset is its early-spring collection period, during which sparse tree canopies allow the aerial robot to partially observe the ground robot and terrain through the trees, allowing for occlusion-aware collaborative perception. Unlike prior multi-robot datasets that focus on SLAM or simulated cooperative driving, GA3T is specifically designed to support research on cross-view perception, air-ground viewpoint fusion, traversability estimation, and collaborative scene understanding in real off-road environments.