Single GPU Task Adaptation of Pathology Foundation Models for Whole Slide Image Analysis

📅 2025-06-05

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

Pathology foundation models (PFMs) struggle to adapt gigapixel whole-slide images (WSIs) to clinical tasks under weak supervision (i.e., WSI-level labels) on single-GPU hardware. Method: We propose TAPFM, an end-to-end task-adaptation framework that jointly optimizes patch-level feature representation and instance-level prediction via a learnable multiple-instance learning (MIL) aggregator built upon ViT’s self-attention mechanism. Crucially, TAPFM decouples the PFM backbone from the aggregator during training, enabling efficient fine-tuning on a single GPU. Results: On bladder cancer and lung adenocarcinoma mutation prediction tasks—requiring multi-label classification of clinically actionable mutations—TAPFM substantially outperforms state-of-the-art weakly supervised methods. Both inference and training run entirely on a single consumer-grade GPU (e.g., RTX 4090), achieving strong performance while ensuring practical deployability in resource-constrained clinical settings.

Technology Category

Application Category

📝 Abstract

Pathology foundation models (PFMs) have emerged as powerful tools for analyzing whole slide images (WSIs). However, adapting these pretrained PFMs for specific clinical tasks presents considerable challenges, primarily due to the availability of only weak (WSI-level) labels for gigapixel images, necessitating multiple instance learning (MIL) paradigm for effective WSI analysis. This paper proposes a novel approach for single-GPU extbf{T}ask extbf{A}daptation of extbf{PFM}s (TAPFM) that uses vision transformer (vit) attention for MIL aggregation while optimizing both for feature representations and attention weights. The proposed approach maintains separate computational graphs for MIL aggregator and the PFM to create stable training dynamics that align with downstream task objectives during end-to-end adaptation. Evaluated on mutation prediction tasks for bladder cancer and lung adenocarcinoma across institutional and TCGA cohorts, TAPFM consistently outperforms conventional approaches, with H-Optimus-0 (TAPFM) outperforming the benchmarks. TAPFM effectively handles multi-label classification of actionable mutations as well. Thus, TAPFM makes adaptation of powerful pre-trained PFMs practical on standard hardware for various clinical applications.

Problem

Research questions and friction points this paper is trying to address.

Adapting pathology foundation models for clinical tasks with weak WSI-level labels

Optimizing feature representations and attention weights for MIL aggregation

Enabling single-GPU adaptation of PFMs for multi-label mutation prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses vision transformer attention for MIL aggregation

Maintains separate computational graphs for stability

Optimizes feature representations and attention weights

🔎 Similar Papers

Towards A Generalizable Pathology Foundation Model via Unified Knowledge Distillation