Multimodal Distillation-Driven Ensemble Learning for Long-Tailed Histopathology Whole Slide Images Analysis

📅 2025-03-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the long-tailed class distribution problem in whole-slide image (WSI) analysis for computational pathology. We propose a multi-expert ensemble framework built upon multi-instance learning (MIL), featuring a shared aggregator and multiple specialized decoders to jointly model distributional diversity. To enhance semantic-aware representation learning for minority classes, we introduce a learnable text-prompt-guided multimodal knowledge distillation mechanism. Our approach innovatively integrates a pre-trained pathology text encoder, dynamic prompt tuning, and consistency regularization to strengthen discriminative capability for sparse categories. Evaluated on two long-tailed WSI benchmarks—Camelyon+-LT and PANDA-LT—the method achieves over 8.2% absolute improvement in minority-class classification accuracy over state-of-the-art methods, while demonstrating significantly enhanced generalization and robustness.

Technology Category

Application Category

📝 Abstract
Multiple Instance Learning (MIL) plays a significant role in computational pathology, enabling weakly supervised analysis of Whole Slide Image (WSI) datasets. The field of WSI analysis is confronted with a severe long-tailed distribution problem, which significantly impacts the performance of classifiers. Long-tailed distributions lead to class imbalance, where some classes have sparse samples while others are abundant, making it difficult for classifiers to accurately identify minority class samples. To address this issue, we propose an ensemble learning method based on MIL, which employs expert decoders with shared aggregators and consistency constraints to learn diverse distributions and reduce the impact of class imbalance on classifier performance. Moreover, we introduce a multimodal distillation framework that leverages text encoders pre-trained on pathology-text pairs to distill knowledge and guide the MIL aggregator in capturing stronger semantic features relevant to class information. To ensure flexibility, we use learnable prompts to guide the distillation process of the pre-trained text encoder, avoiding limitations imposed by specific prompts. Our method, MDE-MIL, integrates multiple expert branches focusing on specific data distributions to address long-tailed issues. Consistency control ensures generalization across classes. Multimodal distillation enhances feature extraction. Experiments on Camelyon+-LT and PANDA-LT datasets show it outperforms state-of-the-art methods.
Problem

Research questions and friction points this paper is trying to address.

Addresses long-tailed distribution in histopathology WSI analysis.
Proposes ensemble learning with expert decoders for class imbalance.
Introduces multimodal distillation to enhance semantic feature extraction.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Ensemble learning with expert decoders and shared aggregators
Multimodal distillation using pre-trained text encoders
Learnable prompts for flexible knowledge distillation
🔎 Similar Papers
No similar papers found.
Xitong Ling
Xitong Ling
Tsinghua University
AI4PathologyFoundation-ModelVision-Language-Model
Y
Yifeng Ping
School of Interdisciplinary Studies, Lingnan University, China
J
Jiawen Li
Shenzhen International Graduate School, Tsinghua University, China
J
Jing Peng
Shenzhen International Graduate School, Tsinghua University, China
Y
Yuxuan Chen
Shenzhen International Graduate School, Tsinghua University, China
Minxi Ouyang
Minxi Ouyang
Tsinghua University
cvpathology
Y
Yizhi Wang
Shenzhen International Graduate School, Tsinghua University, China
Yonghong He
Yonghong He
清华大学深圳国际研究生院
生物医学工程,光学成像,AI图像处理、病理大模型
T
Tian Guan
Shenzhen International Graduate School, Tsinghua University, China
X
Xiaoping Liu
Zhongnan Hospital, Wuhan University, China
L
Lianghui Zhu
Shenzhen International Graduate School, Tsinghua University, China