🤖 AI Summary
This study addresses the challenge of real-time catheter tip tracking in mechanical thrombectomy, where low-contrast, noisy X-ray fluoroscopy images and instrument occlusions hinder accurate localization. To overcome this, the authors propose a multi-threaded real-time tracking pipeline that, for the first time, integrates Transformer-based architectures—specifically SegFormer—into the catheter tip segmentation task. The approach is enhanced by a two-stage post-processing strategy comprising component filtering, single-pixel skeletonization, and arc-length-based greedy path tracing, enabling high-precision localization in complex clinical scenarios. Experimental results demonstrate that the binary-class SegFormer model achieves a mean absolute error of 4.44 mm on manually annotated data and improves the Dice score by 5% over the CathAction baseline, significantly outperforming existing methods and effectively supporting reinforcement learning–driven autonomous navigation systems.
📝 Abstract
Purpose: Mechanical thrombectomy (MT) improves stroke outcomes, but is limited by a lack of local treatment access. Widespread distribution of reinforcement learning (RL)-based robotic systems can be used to alleviate this challenge through autonomous navigation, but current RL methods require live device tip coordinate tracking to function. This paper aims to develop and evaluate a real-time catheter tip tracking pipeline under fluoroscopy, addressing challenges such as low contrast, noise, and device occlusion. Methods: A multi-threaded pipeline was designed, incorporating frame reading, preprocessing, inference, and post-processing. Deep learning segmentation models, including U-Net, U-Net+Transformer, and SegFormer, were trained and benchmarked using two-class and three-class formulations. Post-processing involved two-step component filtering, one-pixel medial skeletonization, and greedy arc-length path following with contour fall-back. Results: On manually-labeled moderate complexity fluoroscopic video data, the two-class SegFormer achieved a mean absolute error of 4.44 mm, outperforming U-Net (4.60 mm), U-Net+Transformer (6.20 mm) and all three-class models (5.19-7.74 mm). On segmentation benchmarks, the system exceeded state-of-the-art CathAction results with improvements of up to +5% in Dice scores for three-segmentation. Conclusion: The results demonstrate that the proposed multi-threaded tracking framework maintains stable performance under challenging imaging conditions, outperforming prior benchmarks, while providing a reliable and efficient foundation for RL-based autonomous MT navigation.