Robust Anomaly Detection through Multi-Modal Autoencoder Fusion for Small Vehicle Damage Detection

📅 2025-09-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of timely detection of subtle vehicle damages—particularly undercarriage dents and scratches—in shared and rental fleets, this paper proposes a real-time anomaly detection method based on a multimodal autoencoder. Unlike vision-centric approaches, our method innovatively fuses onboard IMU vibration signals and microphone-acquired acoustic signals, employing a joint feature-level fusion and decision-level ensemble pooling architecture for end-to-end learning. This design enables low-resource, real-time deployment without expensive cameras. Evaluated on real-world fleet data, the model achieves 92% ROC-AUC, substantially outperforming unimodal baselines and state-of-the-art methods, demonstrating high sensitivity and robustness to minute structural anomalies. The framework is readily extensible to broader automotive safety applications, including collision perception and airbag trigger prediction.

Technology Category

Application Category

📝 Abstract
Wear and tear detection in fleet and shared vehicle systems is a critical challenge, particularly in rental and car-sharing services, where minor damage, such as dents, scratches, and underbody impacts, often goes unnoticed or is detected too late. Currently, manual inspection methods are the default approach but are labour intensive and prone to human error. In contrast, state-of-the-art image-based methods struggle with real-time performance and are less effective at detecting underbody damage due to limited visual access and poor spatial coverage. This work introduces a novel multi-modal architecture based on anomaly detection to address these issues. Sensors such as IMUs and microphones are integrated into a compact device mounted on the vehicle's windshield. This approach supports real-time damage detection while avoiding the need for highly resource-intensive sensors. We developed multiple variants of multi-modal autoencoder-based architectures and evaluated them against unimodal and state-of-the-art methods. Our ensemble pooling multi-modal model achieved the highest performance, with a Receiver Operating Characteristic-Area Under Curve (ROC-AUC) of 92%, demonstrating its effectiveness in real-world applications. This approach can also be extended to other applications, such as improving automotive safety - where it can integrate with airbag systems for efficient deployment - and helping autonomous vehicles by complementing other sensors in collision detection.
Problem

Research questions and friction points this paper is trying to address.

Detects small vehicle damage like dents and scratches automatically
Overcomes limitations of manual inspections and image-based methods
Addresses real-time detection challenges with multi-modal sensor fusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal autoencoder fusion architecture
IMU and microphone sensors integration
Ensemble pooling model achieving 92% ROC-AUC
🔎 Similar Papers
No similar papers found.
S
Sara Khan
Faculty of Mathematics and Computer Science, University of Bremen, Bremen, 28359, Lower Saxony, Germany; Engineering Software Communication, Robert Bosch GmbH, Renningen, 71272, Baden-Württemberg, Germany
M
Mehmed Yüksel
Robotics Innovation Center, Deutsches Forschungszentrum für Künstliche Intelligenz, Bremen, 28359, Lower Saxony, Germany
Frank Kirchner
Frank Kirchner
Professor für Robotik, Universität Bremen, DFKI
artificial intelligenceroboticsmachine learningHuman-Machine-Interfacewalking robots