š¤ AI Summary
This work addresses the challenges of weak generalization and low boundary accuracy in visual fault detection for freight trains, which arise from repetitive components, occlusions, and contamination. To overcome these issues, the authors propose a lightweight self-prompting instance segmentation framework built upon the Segment Anything Model (SAM). The approach introduces an innovative self-prompt generation mechanism to enable efficient knowledge transfer from the foundation model to the specific domain, while employing a Tiny Vision Transformer backbone to facilitate real-time deployment on edge devices. Evaluated on a newly curated real-world freight train dataset, the model achieves 74.6 APā¢įµįµĖ£ and 74.2 APā¢įµįµĖ¢įµ, outperforming current state-of-the-art methods and striking an effective balance among accuracy, robustness, and computational efficiency.
š Abstract
Accurate visual fault detection in freight trains remains a critical challenge for intelligent transportation system maintenance, due to complex operational environments, structurally repetitive components, and frequent occlusions or contaminations in safety-critical regions. Conventional instance segmentation methods based on convolutional neural networks and Transformers often suffer from poor generalization and limited boundary accuracy under such conditions. To address these challenges, we propose a lightweight self-prompted instance segmentation framework tailored for freight train fault detection. Our method leverages the Segment Anything Model by introducing a self-prompt generation module that automatically produces task-specific prompts, enabling effective knowledge transfer from foundation models to domain-specific inspection tasks. In addition, we adopt a Tiny Vision Transformer backbone to reduce computational cost, making the framework suitable for real-time deployment on edge devices in railway monitoring systems. We construct a domain-specific dataset collected from real-world freight inspection stations and conduct extensive evaluations. Experimental results show that our method achieves 74.6 $AP^{\text{box}}$ and 74.2 $AP^{\text{mask}}$ on the dataset, outperforming existing state-of-the-art methods in both accuracy and robustness while maintaining low computational overhead. This work offers a deployable and efficient vision solution for automated freight train inspection, demonstrating the potential of foundation model adaptation in industrial-scale fault diagnosis scenarios. Project page: https://github.com/MVME-HBUT/SAM_FTI-FDet.git