🤖 AI Summary
This work addresses the limitations of existing terrain understanding methods, which rely on robot-specific annotations or predefined semantic categories, hindering generalization across platforms and failing to capture the visual diversity of natural terrains. To overcome these challenges, we propose Trinity, a unified Transformer-based architecture that, for the first time, integrates category-agnostic visual terrain segmentation with semantic segmentation within a single framework. Trinity partitions terrain solely based on visual appearance, eliminating the need for predefined labels or robot-centric scoring. We introduce a new dataset comprising synthetic data (RUGDSynth) and real-world imagery with dual annotations (EXTerra), and extend the OAISYS simulator to enable joint training. Experiments demonstrate that our approach significantly enhances terrain understanding in complex outdoor environments, effectively supporting downstream tasks such as traversability estimation, visual odometry, and mission planning.
📝 Abstract
Terrain understanding is fundamental for mobile robots operating in unstructured outdoor environments. Existing vision-based traversability estimation methods rely on robot-specific annotations or semantic class mappings, limiting transferability across platforms and requiring costly re-annotation when robot capabilities change, while standard semantic segmentation methods only focus on specific predefined classes, which do not capture the variety of terrains. In this work, we propose a transformer-based architecture that jointly performs class-specific semantic segmentation and class-agnostic terrain segmentation within a unified network, called Trinity. Terrain regions are segmented based solely on visual appearance, without predefined semantic labels or robot-dependent traversability scores. This formulation enables the learning of robot-agnostic visual terrain priors that can be combined with robot-specific experience for downstream tasks such as traversability estimation, visual odometry, and mission planning. To enable large-scale training with diverse terrain appearances, we extend the OAISYS simulator and introduce RUGDSynth, a synthetic dataset inspired by RUGD with class-agnostic terrain samples. Furthermore, we present the EXTerra Dataset, providing real-world images annotated with both class-specific and class-agnostic terrain labels. Experiments demonstrate the feasibility of the proposed task and the effectiveness of our joint segmentation approach in complex outdoor environments. Code and datasets will be released with this publication (after review).