Parameter-efficient Prompt Tuning and Hierarchical Textual Guidance for Few-shot Whole Slide Image Classification

📅 2026-03-22

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

This work addresses the challenges in few-shot weakly supervised whole-slide image (WSI) classification, where instance-level annotations are scarce, existing vision-language model prompt tuning methods involve excessive trainable parameters, and hard instance filtering leads to semantic information loss. To overcome these limitations, the authors propose a parameter-efficient prompt tuning approach that introduces feature scaling and shifting within the text encoder and devises a soft hierarchical text-guided strategy. This design effectively integrates the prior knowledge of vision-language models with the hierarchical structure inherent in WSIs while preserving critical semantic content. Evaluated on breast, lung, and ovarian cancer datasets, the method achieves up to a 13.8% improvement in classification accuracy, reduces trainable parameters by 5.8%–18.1%, and demonstrates strong performance in weakly supervised tumor localization tasks.

Technology Category

Application Category

📝 Abstract

Whole Slide Images (WSIs) are giga-pixel in scale and are typically partitioned into small instances in WSI classification pipelines for computational feasibility. However, obtaining extensive instance level annotations is costly, making few-shot weakly supervised WSI classification (FSWC) crucial for learning from limited slide-level labels. Recently, pre-trained vision-language models (VLMs) have been adopted in FSWC, yet they exhibit several limitations. Existing prompt tuning methods in FSWC substantially increase both the number of trainable parameters and inference overhead. Moreover, current methods discard instances with low alignment to text embeddings from VLMs, potentially leading to information loss. To address these challenges, we propose two key contributions. First, we introduce a new parameter efficient prompt tuning method by scaling and shifting features in text encoder, which significantly reduces the computational cost. Second, to leverage not only the pre-trained knowledge of VLMs, but also the inherent hierarchical structure of WSIs, we introduce a WSI representation learning approach with a soft hierarchical textual guidance strategy without utilizing hard instance filtering. Comprehensive evaluations on pathology datasets covering breast, lung, and ovarian cancer types demonstrate consistent improvements up-to 10.9%, 7.8%, and 13.8% respectively, over the state-of-the-art methods in FSWC. Our method reduces the number of trainable parameters by 18.1% on both breast and lung cancer datasets, and 5.8% on the ovarian cancer dataset, while also excelling at weakly-supervised tumor localization. Code at https://github.com/Jayanie/HIPSS.

Problem

Research questions and friction points this paper is trying to address.

Few-shot WSI classification

Prompt tuning

Weakly supervised learning

Vision-language models

Hierarchical structure

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parameter-efficient Prompt Tuning

Hierarchical Textual Guidance

Few-shot WSI Classification