EndoFinder: Online Lesion Retrieval for Explainable Colorectal Polyp Diagnosis Leveraging Latent Scene Representations

📅 2025-07-23

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

In colorectal cancer screening, deep learning models are often hindered by heavy reliance on labeled data and opaque decision-making. To address these limitations, we propose EndoFinder—a self-supervised, online polyp retrieval framework grounded in multi-view scene representation for interpretable and scalable endoscopic assistance. Our method jointly optimizes contrastive learning and image reconstruction to train a polyp-aware encoder; employs a scene-representation Transformer that fuses multi-view endoscopic observations to model polyps as 3D structures; and introduces a hash-based discretization mechanism enabling efficient, transparent real-time retrieval. Evaluated on both public benchmarks and a newly curated dataset, EndoFinder achieves significant improvements in polyp re-identification accuracy (+8.2% mAP) and pathological classification (up to +5.7% F1-score), while reducing annotation dependency by >90%. The framework delivers clinically actionable, human-interpretable AI support for colonoscopy without requiring exhaustive pixel-level annotations.

Technology Category

Application Category

📝 Abstract

Colorectal cancer (CRC) remains a leading cause of cancer-related mortality, underscoring the importance of timely polyp detection and diagnosis. While deep learning models have improved optical-assisted diagnostics, they often demand extensive labeled datasets and yield "black-box" outputs with limited interpretability. In this paper, we propose EndoFinder, an online polyp retrieval framework that leverages multi-view scene representations for explainable and scalable CRC diagnosis. First, we develop a Polyp-aware Image Encoder by combining contrastive learning and a reconstruction task, guided by polyp segmentation masks. This self-supervised approach captures robust features without relying on large-scale annotated data. Next, we treat each polyp as a three-dimensional "scene" and introduce a Scene Representation Transformer, which fuses multiple views of the polyp into a single latent representation. By discretizing this representation through a hashing layer, EndoFinder enables real-time retrieval from a compiled database of historical polyp cases, where diagnostic information serves as interpretable references for new queries. We evaluate EndoFinder on both public and newly collected polyp datasets for re-identification and pathology classification. Results show that EndoFinder outperforms existing methods in accuracy while providing transparent, retrieval-based insights for clinical decision-making. By contributing a novel dataset and a scalable, explainable framework, our work addresses key challenges in polyp diagnosis and offers a promising direction for more efficient AI-driven colonoscopy workflows. The source code is available at https://github.com/ku262/EndoFinder-Scene.

Problem

Research questions and friction points this paper is trying to address.

Improves colorectal polyp diagnosis with explainable AI

Reduces reliance on large labeled datasets via self-supervised learning

Enables real-time retrieval of similar historical polyp cases

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised polyp-aware image encoder

Scene Representation Transformer for 3D fusion

Real-time retrieval via hashing layer

🔎 Similar Papers

A Survey on Deep Learning for Polyp Segmentation: Techniques, Challenges and Future Trends

2023-11-30Visual IntelligenceCitations: 7

Authors to Follow