PipeMFL-240K: A Large-scale Dataset and Benchmark for Object Detection in Pipeline Magnetic Flux Leakage Imaging

๐Ÿ“… 2026-02-04
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the long-standing limitation in automatic defect detection for magnetic flux leakage (MFL) imagingโ€”namely, the absence of large-scale public datasets and standardized benchmarks. To bridge this gap, we introduce PipeMFL-240K, the first large-scale object detection dataset and benchmark specifically designed for pipeline MFL pseudocolor images. It comprises over 240,000 images and 190,000 high-quality bounding box annotations across 12 defect categories, faithfully capturing real-world challenges such as long-tailed class distribution, small object scales, and high intra-class variability. Leveraging this dataset, we conduct a systematic evaluation of state-of-the-art detection models, revealing their performance bottlenecks in MFL scenarios and establishing a reliable, reproducible platform to foster future algorithmic innovation and pipeline integrity assessment.

Technology Category

Application Category

๐Ÿ“ Abstract
Pipeline integrity is critical to industrial safety and environmental protection, with Magnetic Flux Leakage (MFL) detection being a primary non-destructive testing technology. Despite the promise of deep learning for automating MFL interpretation, progress toward reliable models has been constrained by the absence of a large-scale public dataset and benchmark, making fair comparison and reproducible evaluation difficult. We introduce \textbf{PipeMFL-240K}, a large-scale, meticulously annotated dataset and benchmark for complex object detection in pipeline MFL pseudo-color images. PipeMFL-240K reflects real-world inspection complexity and poses several unique challenges: (i) an extremely long-tailed distribution over \textbf{12} categories, (ii) a high prevalence of tiny objects that often comprise only a handful of pixels, and (iii) substantial intra-class variability. The dataset contains \textbf{240,320} images and \textbf{191,530} high-quality bounding-box annotations, collected from 11 pipelines spanning approximately \textbf{1,480} km. Extensive experiments are conducted with state-of-the-art object detectors to establish baselines. Results show that modern detectors still struggle with the intrinsic properties of MFL data, highlighting considerable headroom for improvement, while PipeMFL-240K provides a reliable and challenging testbed to drive future research. As the first public dataset and the first benchmark of this scale and scope for pipeline MFL inspection, it provides a critical foundation for efficient pipeline diagnostics as well as maintenance planning and is expected to accelerate algorithmic innovation and reproducible research in MFL-based pipeline integrity assessment.
Problem

Research questions and friction points this paper is trying to address.

Magnetic Flux Leakage
Object Detection
Pipeline Inspection
Dataset
Benchmark
Innovation

Methods, ideas, or system contributions that make the work stand out.

Magnetic Flux Leakage (MFL)
large-scale dataset
object detection
pipeline inspection
long-tailed distribution
๐Ÿ”Ž Similar Papers
No similar papers found.
T
Tianyi Qu
SINOMACH Sensing Tech Co., Ltd, Shenyang, Liaoning, China
S
Songxiao Yang
Institute of Science Tokyo, Tokyo, Japan
Haolin Wang
Haolin Wang
Ph.D. Student. Georgia Institute of Technology
infrastructure monitoringasset managementAIMLcomputer vision
H
Huadong Song
SINOMACH Sensing Tech Co., Ltd, Shenyang, Liaoning, China
X
Xiaoting Guo
SINOMACH Sensing Tech Co., Ltd, Shenyang, Liaoning, China
W
Wenguang Hu
SINOMACH Sensing Tech Co., Ltd, Shenyang, Liaoning, China
Guanlin Liu
Guanlin Liu
ByteDance
Language ModelReinforcement LearningMachine learningStatistics
H
Honghe Chen
SINOMACH Sensing Tech Co., Ltd, Shenyang, Liaoning, China
Yafei Ou
Yafei Ou
Tokyo Institute of Technology
Medical Image AnalysisMachine LearningComputer Vision