π€ AI Summary
In colorectal cancer screening, deep learning models are often hindered by heavy reliance on labeled data and opaque decision-making. To address these limitations, we propose EndoFinderβa self-supervised, online polyp retrieval framework grounded in multi-view scene representation for interpretable and scalable endoscopic assistance. Our method jointly optimizes contrastive learning and image reconstruction to train a polyp-aware encoder; employs a scene-representation Transformer that fuses multi-view endoscopic observations to model polyps as 3D structures; and introduces a hash-based discretization mechanism enabling efficient, transparent real-time retrieval. Evaluated on both public benchmarks and a newly curated dataset, EndoFinder achieves significant improvements in polyp re-identification accuracy (+8.2% mAP) and pathological classification (up to +5.7% F1-score), while reducing annotation dependency by >90%. The framework delivers clinically actionable, human-interpretable AI support for colonoscopy without requiring exhaustive pixel-level annotations.
π Abstract
Colorectal cancer (CRC) remains a leading cause of cancer-related mortality, underscoring the importance of timely polyp detection and diagnosis. While deep learning models have improved optical-assisted diagnostics, they often demand extensive labeled datasets and yield "black-box" outputs with limited interpretability. In this paper, we propose EndoFinder, an online polyp retrieval framework that leverages multi-view scene representations for explainable and scalable CRC diagnosis. First, we develop a Polyp-aware Image Encoder by combining contrastive learning and a reconstruction task, guided by polyp segmentation masks. This self-supervised approach captures robust features without relying on large-scale annotated data. Next, we treat each polyp as a three-dimensional "scene" and introduce a Scene Representation Transformer, which fuses multiple views of the polyp into a single latent representation. By discretizing this representation through a hashing layer, EndoFinder enables real-time retrieval from a compiled database of historical polyp cases, where diagnostic information serves as interpretable references for new queries. We evaluate EndoFinder on both public and newly collected polyp datasets for re-identification and pathology classification. Results show that EndoFinder outperforms existing methods in accuracy while providing transparent, retrieval-based insights for clinical decision-making. By contributing a novel dataset and a scalable, explainable framework, our work addresses key challenges in polyp diagnosis and offers a promising direction for more efficient AI-driven colonoscopy workflows. The source code is available at https://github.com/ku262/EndoFinder-Scene.