Memory-SAM: Human-Prompt-Free Tongue Segmentation via Retrieval-to-Prompt

๐Ÿ“… 2025-10-17
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the reliance of tongue image segmentation on manual annotations or interactive prompts, this paper proposes a fully automated segmentation paradigm requiring no human intervention. Our method constructs a compact prior case library and leverages DINOv3 to extract dense self-supervised features; FAISS-based approximate nearest-neighbor retrieval then automatically generates high-quality prompt points. Furthermore, we introduce a mask-constrained correspondence point distillation mechanism to guide SAM2 toward precise segmentation. To our knowledge, this is the first approach achieving end-to-end automatic tongue image segmentation without model fine-tuning, manual prompting, or ground-truth annotations. Evaluated on a mixed test set, it achieves an mIoU of 0.9863โ€”substantially outperforming FCN and bounding-box-based baselines. The method demonstrates superior robustness in complex tongue boundary delineation and real-world scenarios, while exhibiting high data efficiency.

Technology Category

Application Category

๐Ÿ“ Abstract
Accurate tongue segmentation is crucial for reliable TCM analysis. Supervised models require large annotated datasets, while SAM-family models remain prompt-driven. We present Memory-SAM, a training-free, human-prompt-free pipeline that automatically generates effective prompts from a small memory of prior cases via dense DINOv3 features and FAISS retrieval. Given a query image, mask-constrained correspondences to the retrieved exemplar are distilled into foreground/background point prompts that guide SAM2 without manual clicks or model fine-tuning. We evaluate on 600 expert-annotated images (300 controlled, 300 in-the-wild). On the mixed test split, Memory-SAM achieves mIoU 0.9863, surpassing FCN (0.8188) and a detector-to-box SAM baseline (0.1839). On controlled data, ceiling effects above 0.98 make small differences less meaningful given annotation variability, while our method shows clear gains under real-world conditions. Results indicate that retrieval-to-prompt enables data-efficient, robust segmentation of irregular boundaries in tongue imaging. The code is publicly available at https://github.com/jw-chae/memory-sam.
Problem

Research questions and friction points this paper is trying to address.

Automates tongue segmentation without manual prompts or training
Generates prompts via retrieval from prior cases using DINOv3 features
Enables robust segmentation of irregular tongue boundaries efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatically generates prompts from memory cases
Uses DINOv3 features and FAISS retrieval system
Achieves training-free segmentation via mask-constrained correspondences
๐Ÿ”Ž Similar Papers
No similar papers found.
J
Joongwon Chae
Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
L
Lihui Luo
Tsinghua Shenzhen International Graduate School, Tsinghua University, Shenzhen, China
X
Xi Yuan
Zhejiang Key Laboratory of Imaging and Interventional Medicine, The Fifth Affiliated Hospital of Wenzhou Medical University, Lishui, China
D
Dongmei Yu
Affiliated Fifth Hospital, Wenzhou Medical University, Wenzhou, Zhejiang, China
Z
Zhenglin Chen
Zhejiang Key Laboratory of Imaging and Interventional Medicine, The Fifth Affiliated Hospital of Wenzhou Medical University, Lishui, China
Lian Zhang
Lian Zhang
Student of Electrical Engineering and Computer Science, Vanderbilt University
Intelligent Human Machine SystemsMachine LearningArtificial IntelligenceAffective ComputingHuman-Computer Interactions
Peiwu Qin
Peiwu Qin
Tsinghua Shenzhen International Graduate School
Image ProcessingTCM