InspecSafe-V1: A Multimodal Benchmark for Safety Assessment in Industrial Inspection Scenarios

📅 2026-01-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited reliability of existing industrial AI systems in complex dynamic environments, primarily due to the absence of real-world, multimodal, and fine-grained annotated datasets. To bridge this gap, we present the first multimodal safety assessment benchmark for real-world industrial inspection, encompassing five representative scenarios and 5,013 inspection instances. The dataset was collected synchronously by 41 robots across 2,239 locations, capturing seven modalities: visible light, infrared, audio, depth, LiDAR point clouds, gas concentration, and temperature-humidity readings. Each instance is annotated with pixel-level segmentation masks, semantic descriptions, and safety-level labels. This benchmark substantially enhances the multimodal comprehension, anomaly detection, and safety-aware decision-making capabilities of industrial foundation models in challenging operational settings.

Technology Category

Application Category

📝 Abstract
With the rapid development of industrial intelligence and unmanned inspection, reliable perception and safety assessment for AI systems in complex and dynamic industrial sites has become a key bottleneck for deploying predictive maintenance and autonomous inspection. Most public datasets remain limited by simulated data sources, single-modality sensing, or the absence of fine-grained object-level annotations, which prevents robust scene understanding and multimodal safety reasoning for industrial foundation models. To address these limitations, InspecSafe-V1 is released as the first multimodal benchmark dataset for industrial inspection safety assessment that is collected from routine operations of real inspection robots in real-world environments. InspecSafe-V1 covers five representative industrial scenarios, including tunnels, power facilities, sintering equipment, oil and gas petrochemical plants, and coal conveyor trestles. The dataset is constructed from 41 wheeled and rail-mounted inspection robots operating at 2,239 valid inspection sites, yielding 5,013 inspection instances. For each instance, pixel-level segmentation annotations are provided for key objects in visible-spectrum images. In addition, a semantic scene description and a corresponding safety level label are provided according to practical inspection tasks. Seven synchronized sensing modalities are further included, including infrared video, audio, depth point clouds, radar point clouds, gas measurements, temperature, and humidity, to support multimodal anomaly recognition, cross-modal fusion, and comprehensive safety assessment in industrial environments.
Problem

Research questions and friction points this paper is trying to address.

industrial inspection
safety assessment
multimodal perception
dataset limitation
scene understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal benchmark
industrial inspection
safety assessment
pixel-level annotation
cross-modal fusion
🔎 Similar Papers
No similar papers found.
Zeyi Liu
Zeyi Liu
Tsinghua University
Safety-guaranteed ControlSafety AssessmentFault DiagnosisOnilne Learning
S
Shuang Liu
Department of Automation, Tsinghua University, Beijing 100084, China
J
Jihai Min
TetraBOT Intelligence Co., Ltd., Nanjing 210000, China
Z
Zhaoheng Zhang
TetraBOT Intelligence Co., Ltd., Nanjing 210000, China
J
Jun Cen
DAMO Academy, Alibaba Group, Hangzhou 311100, China
Pengyu Han
Pengyu Han
Tsinghua University
fault diagnosismachine learning
Song Hu
Song Hu
Professor of Biomedical Engineering, Washington University in St. Louis
PhotoacousticsBiophotonicsBrain ImagingFiber OpticsNanophotonics
Z
Zihan Meng
TetraBOT Intelligence Co., Ltd., Nanjing 210000, China
X
Xiao He
Department of Automation, Tsinghua University, Beijing 100084, China
D
Donghua Zhou
School of Automation, Southeast University, Nanjing 210096, China