EndoFinder: Online Lesion Retrieval for Explainable Colorectal Polyp Diagnosis Leveraging Latent Scene Representations

πŸ“… 2025-07-23
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
In colorectal cancer screening, deep learning models are often hindered by heavy reliance on labeled data and opaque decision-making. To address these limitations, we propose EndoFinderβ€”a self-supervised, online polyp retrieval framework grounded in multi-view scene representation for interpretable and scalable endoscopic assistance. Our method jointly optimizes contrastive learning and image reconstruction to train a polyp-aware encoder; employs a scene-representation Transformer that fuses multi-view endoscopic observations to model polyps as 3D structures; and introduces a hash-based discretization mechanism enabling efficient, transparent real-time retrieval. Evaluated on both public benchmarks and a newly curated dataset, EndoFinder achieves significant improvements in polyp re-identification accuracy (+8.2% mAP) and pathological classification (up to +5.7% F1-score), while reducing annotation dependency by >90%. The framework delivers clinically actionable, human-interpretable AI support for colonoscopy without requiring exhaustive pixel-level annotations.

Technology Category

Application Category

πŸ“ Abstract
Colorectal cancer (CRC) remains a leading cause of cancer-related mortality, underscoring the importance of timely polyp detection and diagnosis. While deep learning models have improved optical-assisted diagnostics, they often demand extensive labeled datasets and yield "black-box" outputs with limited interpretability. In this paper, we propose EndoFinder, an online polyp retrieval framework that leverages multi-view scene representations for explainable and scalable CRC diagnosis. First, we develop a Polyp-aware Image Encoder by combining contrastive learning and a reconstruction task, guided by polyp segmentation masks. This self-supervised approach captures robust features without relying on large-scale annotated data. Next, we treat each polyp as a three-dimensional "scene" and introduce a Scene Representation Transformer, which fuses multiple views of the polyp into a single latent representation. By discretizing this representation through a hashing layer, EndoFinder enables real-time retrieval from a compiled database of historical polyp cases, where diagnostic information serves as interpretable references for new queries. We evaluate EndoFinder on both public and newly collected polyp datasets for re-identification and pathology classification. Results show that EndoFinder outperforms existing methods in accuracy while providing transparent, retrieval-based insights for clinical decision-making. By contributing a novel dataset and a scalable, explainable framework, our work addresses key challenges in polyp diagnosis and offers a promising direction for more efficient AI-driven colonoscopy workflows. The source code is available at https://github.com/ku262/EndoFinder-Scene.
Problem

Research questions and friction points this paper is trying to address.

Improves colorectal polyp diagnosis with explainable AI
Reduces reliance on large labeled datasets via self-supervised learning
Enables real-time retrieval of similar historical polyp cases
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised polyp-aware image encoder
Scene Representation Transformer for 3D fusion
Real-time retrieval via hashing layer
πŸ”Ž Similar Papers
R
Ruijie Yang
Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, Shanghai, China; School of Software Technology, Zhejiang University, Ningbo, Zhejiang, China; Shanghai Key Laboratory of MICCAI, Shanghai, China; Shanghai Institute for Advanced Study of Zhejiang University, Shanghai, China
Y
Yan Zhu
Endoscopy Center and Endoscopy Research Institute, Zhongshan Hospital, Fudan University, Shanghai, China; Shanghai Collaborative Innovation Center of Endoscopy, Shanghai, China
P
Peiyao Fu
Endoscopy Center and Endoscopy Research Institute, Zhongshan Hospital, Fudan University, Shanghai, China; Shanghai Collaborative Innovation Center of Endoscopy, Shanghai, China
Y
Yizhe Zhang
School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, Jiangsu, China
Zhihua Wang
Zhihua Wang
City University of Hong Kong
Computer VisionBiomedical EngineeringRobotics
Q
Quanlin Li
Endoscopy Center and Endoscopy Research Institute, Zhongshan Hospital, Fudan University, Shanghai, China; Shanghai Collaborative Innovation Center of Endoscopy, Shanghai, China
P
Pinghong Zhou
Endoscopy Center and Endoscopy Research Institute, Zhongshan Hospital, Fudan University, Shanghai, China; Shanghai Collaborative Innovation Center of Endoscopy, Shanghai, China
Xian Yang
Xian Yang
University of Manchester
Artificial IntelligenceMachine LearningHealthcare AINatural Language Processing
S
Shuo Wang
Digital Medical Research Center, School of Basic Medical Sciences, Fudan University, Shanghai, China