PTCMIL: Multiple Instance Learning via Prompt Token Clustering for Whole Slide Image Analysis

πŸ“… 2025-07-24
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the high computational cost and difficulty in modeling task- and slide-specific heterogeneity in whole-slide image (WSI) analysis using multiple instance learning (MIL), this paper proposes PromptMILβ€”a vision Transformer framework leveraging learnable prompt tokens. PromptMIL unifies clustering and prediction within the prompt token space, enabling lightweight and interpretable feature aggregation via projection-based clustering and prototype pooling. It further introduces a dynamic token merging mechanism to adaptively integrate patch-level diversity. Evaluated on eight public WSI datasets, PromptMIL achieves state-of-the-art performance in both classification and survival analysis tasks, significantly outperforming existing MIL methods. Ablation studies confirm the effectiveness and robustness of prompt-guided clustering and task-coordinated design.

Technology Category

Application Category

πŸ“ Abstract
Multiple Instance Learning (MIL) has advanced WSI analysis but struggles with the complexity and heterogeneity of WSIs. Existing MIL methods face challenges in aggregating diverse patch information into robust WSI representations. While ViTs and clustering-based approaches show promise, they are computationally intensive and fail to capture task-specific and slide-specific variability. To address these limitations, we propose PTCMIL, a novel Prompt Token Clustering-based ViT for MIL aggregation. By introducing learnable prompt tokens into the ViT backbone, PTCMIL unifies clustering and prediction tasks in an end-to-end manner. It dynamically aligns clustering with downstream tasks, using projection-based clustering tailored to each WSI, reducing complexity while preserving patch heterogeneity. Through token merging and prototype-based pooling, PTCMIL efficiently captures task-relevant patterns. Extensive experiments on eight datasets demonstrate its superior performance in classification and survival analysis tasks, outperforming state-of-the-art methods. Systematic ablation studies confirm its robustness and strong interpretability. The code is released at https://github.com/ubc-tea/PTCMIL.
Problem

Research questions and friction points this paper is trying to address.

Addresses complexity and heterogeneity in WSI analysis
Improves aggregation of patch information into robust representations
Reduces computational intensity while capturing task-specific variability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces learnable prompt tokens in ViT
Unifies clustering and prediction end-to-end
Uses projection-based clustering per WSI
πŸ”Ž Similar Papers
No similar papers found.
B
Beidi Zhao
The University of British Columbia
SangMook Kim
SangMook Kim
Department of Artificial Intelligence, Chungnam National University
Federated LearningActive LearningNoisy Label LearningMedical AI
H
Hao Chen
The Hong Kong University of Science and Technology
C
Chen Zhou
The University of British Columbia, BC Cancer Agency
Z
Zu-hua Gao
The University of British Columbia, BC Cancer Agency
G
Gang Wang
The University of British Columbia, BC Cancer Agency
X
Xiaoxiao Li
The University of British Columbia, Vector Institute