🤖 AI Summary
This work addresses the degradation in polyp detection performance caused by haze, motion blur, and specular reflections in clinical endoscopic images. To tackle this challenge, the authors propose a lightweight Transformer architecture featuring a unidirectionally guided dual-decoder design that jointly optimizes deblurring and segmentation. Key innovations include a Global Attention Module (GAM) for cross-scale feature aggregation, a Deblurring-Segmentation Aligner (DSA) to facilitate inter-task feature transfer, and a Learnable Cosine Scheduler (LoCoS) for dynamically balancing multi-task learning. Evaluated on the Kvasir-SEG dataset, the model achieves Dice scores of 0.922 and 0.889 on clean and severely degraded images, respectively, while reducing parameter count by 90% compared to existing methods, demonstrating strong potential for edge deployment.
📝 Abstract
Endoscopic image analysis is vital for colorectal cancer screening, yet real-world conditions often suffer from lens fogging, motion blur, and specular highlights, which severely compromise automated polyp detection. We propose EndoCaver, a lightweight transformer with a unidirectional-guided dual-decoder architecture, enabling joint multi-task capability for image deblurring and segmentation while significantly reducing computational complexity and model parameters. Specifically, it integrates a Global Attention Module (GAM) for cross-scale aggregation, a Deblurring-Segmentation Aligner (DSA) to transfer restoration cues, and a cosine-based scheduler (LoCoS) for stable multi-task optimisation. Experiments on the Kvasir-SEG dataset show that EndoCaver achieves 0.922 Dice on clean data and 0.889 under severe image degradation, surpassing state-of-the-art methods while reducing model parameters by 90%. These results demonstrate its efficiency and robustness, making it well-suited for on-device clinical deployment. Code is available at https://github.com/ReaganWu/EndoCaver.