PipeMFL-240K: A Large-scale Dataset and Benchmark for Object Detection in Pipeline Magnetic Flux Leakage Imaging

📅 2026-02-04

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the long-standing limitation in automatic defect detection for magnetic flux leakage (MFL) imaging—namely, the absence of large-scale public datasets and standardized benchmarks. To bridge this gap, we introduce PipeMFL-240K, the first large-scale object detection dataset and benchmark specifically designed for pipeline MFL pseudocolor images. It comprises over 240,000 images and 190,000 high-quality bounding box annotations across 12 defect categories, faithfully capturing real-world challenges such as long-tailed class distribution, small object scales, and high intra-class variability. Leveraging this dataset, we conduct a systematic evaluation of state-of-the-art detection models, revealing their performance bottlenecks in MFL scenarios and establishing a reliable, reproducible platform to foster future algorithmic innovation and pipeline integrity assessment.

Technology Category

Application Category

📝 Abstract

Pipeline integrity is critical to industrial safety and environmental protection, with Magnetic Flux Leakage (MFL) detection being a primary non-destructive testing technology. Despite the promise of deep learning for automating MFL interpretation, progress toward reliable models has been constrained by the absence of a large-scale public dataset and benchmark, making fair comparison and reproducible evaluation difficult. We introduce \textbf{PipeMFL-240K}, a large-scale, meticulously annotated dataset and benchmark for complex object detection in pipeline MFL pseudo-color images. PipeMFL-240K reflects real-world inspection complexity and poses several unique challenges: (i) an extremely long-tailed distribution over \textbf{12} categories, (ii) a high prevalence of tiny objects that often comprise only a handful of pixels, and (iii) substantial intra-class variability. The dataset contains \textbf{240,320} images and \textbf{191,530} high-quality bounding-box annotations, collected from 11 pipelines spanning approximately \textbf{1,480} km. Extensive experiments are conducted with state-of-the-art object detectors to establish baselines. Results show that modern detectors still struggle with the intrinsic properties of MFL data, highlighting considerable headroom for improvement, while PipeMFL-240K provides a reliable and challenging testbed to drive future research. As the first public dataset and the first benchmark of this scale and scope for pipeline MFL inspection, it provides a critical foundation for efficient pipeline diagnostics as well as maintenance planning and is expected to accelerate algorithmic innovation and reproducible research in MFL-based pipeline integrity assessment.

Problem

Research questions and friction points this paper is trying to address.

Magnetic Flux Leakage

Object Detection

Pipeline Inspection

Dataset

Benchmark

Innovation

Methods, ideas, or system contributions that make the work stand out.

Magnetic Flux Leakage (MFL)

large-scale dataset

object detection