ReconMIL: Synergizing Latent Space Reconstruction with Bi-Stream Mamba for Whole Slide Image Analysis

📅 2026-03-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the domain shift between general-purpose features and task-specific pathological requirements in whole-slide image analysis, as well as the tendency of global aggregation to obscure sparse diagnostic signals. To this end, the authors propose a latent space reconstruction module that adaptively maps generic features onto task-specific manifolds. They further introduce a dual-stream architecture: a Mamba stream captures long-range contextual dependencies, while a CNN stream preserves local morphological abnormalities. A scale-adaptive fusion mechanism dynamically balances global and local information across streams. The proposed method significantly outperforms existing models on multiple benchmarks for both diagnostic classification and survival prediction, effectively suppressing background interference and enhancing fine-grained lesion localization.

Technology Category

Application Category

📝 Abstract
Whole slide image (WSI) analysis heavily relies on multiple instance learning (MIL). While recent methods benefit from large-scale foundation models and advanced sequence modeling to capture long-range dependencies, they still struggle with two critical issues. First, directly applying frozen, task-agnostic features often leads to suboptimal separability due to the domain gap with specific histological tasks. Second, relying solely on global aggregators can cause over-smoothing, where sparse but critical diagnostic signals are overshadowed by the dominant background context. In this paper, we present ReconMIL, a novel framework designed to bridge this domain gap and balance global-local feature aggregation. Our approach introduces a Latent Space Reconstruction module that adaptively projects generic features into a compact, task-specific manifold, improving boundary delineation. To prevent information dilution, we develop a bi-stream architecture combining a Mamba-based global stream for contextual priors and a CNN-based local stream to preserve subtle morphological anomalies. A scale-adaptive selection mechanism dynamically fuses these two streams, determining when to rely on overall architecture versus local saliency. Evaluations across multiple diagnostic and survival prediction benchmarks show that ReconMIL consistently outperforms current state-of-the-art methods, effectively localizing fine-grained diagnostic regions while suppressing background noise. Visualization results confirm the models superior ability to localize diagnostic regions by effectively balancing global structure and local granularity.
Problem

Research questions and friction points this paper is trying to address.

whole slide image
multiple instance learning
domain gap
over-smoothing
diagnostic signal
Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent Space Reconstruction
Bi-Stream Architecture
Mamba
Multiple Instance Learning
Whole Slide Image Analysis