A New Dataset and Framework for Robust Road Surface Classification via Camera-IMU Fusion

📅 2026-01-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing road surface classification methods are limited by reliance on a single sensing modality and insufficient environmental diversity in datasets, hindering generalization in complex real-world scenarios. To address this, this work proposes a multimodal approach that fuses camera and IMU data through a lightweight bidirectional cross-attention module and an adaptive gating mechanism, which dynamically adjusts modality contributions to mitigate domain shift. We introduce ROAD, the first multimodal dataset comprising real-world, vision-dominant, and synthetic subsets, with synchronized RGB-IMU capture. Experimental results demonstrate that our method improves performance by 1.4 percentage points on the PVS benchmark and by 11.6 percentage points on the ROAD multimodal subset, while maintaining high F1 scores under challenging conditions such as nighttime and heavy rain.

Technology Category

Application Category

📝 Abstract
Road surface classification (RSC) is a key enabler for environment-aware predictive maintenance systems. However, existing RSC techniques often fail to generalize beyond narrow operational conditions due to limited sensing modalities and datasets that lack environmental diversity. This work addresses these limitations by introducing a multimodal framework that fuses images and inertial measurements using a lightweight bidirectional cross-attention module followed by an adaptive gating layer that adjusts modality contributions under domain shifts. Given the limitations of current benchmarks, especially regarding lack of variability, we introduce ROAD, a new dataset composed of three complementary subsets: (i) real-world multimodal recordings with RGB-IMU streams synchronized using a gold-standard industry datalogger, captured across diverse lighting, weather, and surface conditions; (ii) a large vision-only subset designed to assess robustness under adverse illumination and heterogeneous capture setups; and (iii) a synthetic subset generated to study out-of-distribution generalization in scenarios difficult to obtain in practice. Experiments show that our method achieves a +1.4 pp improvement over the previous state-of-the-art on the PVS benchmark and an +11.6 pp improvement on our multimodal ROAD subset, with consistently higher F1-scores on minority classes. The framework also demonstrates stable performance across challenging visual conditions, including nighttime, heavy rain, and mixed-surface transitions. These findings indicate that combining affordable camera and IMU sensors with multimodal attention mechanisms provides a scalable, robust foundation for road surface understanding, particularly relevant for regions where environmental variability and cost constraints limit the adoption of high-end sensing suites.
Problem

Research questions and friction points this paper is trying to address.

road surface classification
environmental diversity
multimodal sensing
domain generalization
robustness
Innovation

Methods, ideas, or system contributions that make the work stand out.

camera-IMU fusion
cross-attention
adaptive gating
road surface classification
multimodal dataset
🔎 Similar Papers
No similar papers found.
W
Willams de Lima Costa
Voxar Labs, Centro de Informática, Universidade Federal de Pernambuco, Av. Jornalista Aníbal Fernandes, s/n, Recife, 50.740-560, Brazil
T
Thifany Ketuli Silva de Souza
Voxar Labs, Centro de Informática, Universidade Federal de Pernambuco, Av. Jornalista Aníbal Fernandes, s/n, Recife, 50.740-560, Brazil
J
Jonas Ferreira Silva
Voxar Labs, Centro de Informática, Universidade Federal de Pernambuco, Av. Jornalista Aníbal Fernandes, s/n, Recife, 50.740-560, Brazil
C
Carlos Gabriel Bezerra Pereira
Voxar Labs, Centro de Informática, Universidade Federal de Pernambuco, Av. Jornalista Aníbal Fernandes, s/n, Recife, 50.740-560, Brazil
B
Bruno Reis Vila Nova
Voxar Labs, Centro de Informática, Universidade Federal de Pernambuco, Av. Jornalista Aníbal Fernandes, s/n, Recife, 50.740-560, Brazil
L
Leonardo Silvino Brito
Voxar Labs, Centro de Informática, Universidade Federal de Pernambuco, Av. Jornalista Aníbal Fernandes, s/n, Recife, 50.740-560, Brazil
R
Rafael Raider Leoni
Volkswagen Truck and Bus, Resende, 27537-803, Rio de Janeiro, Brazil
J
Juliano Silva
Volkswagen Truck and Bus, Resende, 27537-803, Rio de Janeiro, Brazil
V
Valter Ferreira
Volkswagen Truck and Bus, Resende, 27537-803, Rio de Janeiro, Brazil
S
Sibele Miguel Soares Neto
Stellantis Brasil, Porto Real, 27570-000, Rio de Janeiro, Brazil
S
Samantha Uehara
Volkswagen do Brasil, São Bernardo do Campo, 09823-901, São Paulo, Brazil
D
Daniel Giacometti Amaral
Embeddo, Volta Redonda, 27251-330, Rio de Janeiro, Brazil
J
João Marcelo Teixeira
Voxar Labs, Centro de Informática, Universidade Federal de Pernambuco, Av. Jornalista Aníbal Fernandes, s/n, Recife, 50.740-560, Brazil
Veronica Teichrieb
Veronica Teichrieb
Professor of Computer Science, Federal University of Pernambuco
Augmented RealityTrackingRenderingInteraction
C
Cristiano Coelho de Araújo
Voxar Labs, Centro de Informática, Universidade Federal de Pernambuco, Av. Jornalista Aníbal Fernandes, s/n, Recife, 50.740-560, Brazil