MedPruner: Training-Free Hierarchical Token Pruning for Efficient 3D Medical Image Understanding in Vision-Language Models

📅 2026-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 3D medical vision-language models suffer from low computational efficiency due to substantial anatomical redundancy introduced by naively concatenating consecutive 2D slices and the use of fixed token pruning ratios that fail to account for varying information density across slices. To address this, this work proposes a training-free, model-agnostic hierarchical token pruning framework. The approach first eliminates inter-slice redundancy through anchor-based filtering and then dynamically selects the most informative tokens based on cumulative attention weights, enabling adaptive compression. By moving beyond rigid, fixed-ratio pruning, the method achieves remarkable efficiency: on three 3D medical benchmarks and across multiple medical vision-language models, it retains less than 5% of the original visual tokens while preserving or even surpassing baseline performance, substantially accelerating inference.

Technology Category

Application Category

📝 Abstract
While specialized Medical Vision-Language Models (VLMs) have achieved remarkable success in interpreting 2D and 3D medical modalities, their deployment for 3D volumetric data remains constrained by significant computational inefficiencies. Current architectures typically suffer from massive anatomical redundancy due to the direct concatenation of consecutive 2D slices and lack the flexibility to handle heterogeneous information densities across different slices using fixed pruning ratios. To address these challenges, we propose MedPruner, a training-free and model-agnostic hierarchical token pruning framework specifically designed for efficient 3D medical image understanding. MedPruner introduces a two-stage mechanism: an Inter-slice Anchor-based Filtering module to eliminate slice-level temporal redundancy, followed by a Dynamic Information Nucleus Selection strategy that achieves adaptive token-level compression by quantifying cumulative attention weights. Extensive experiments on three 3D medical benchmarks and across three diverse medical VLMs reveal massive token redundancy in existing architectures. Notably, MedPruner enables models such as MedGemma to maintain or even exceed their original performance while retaining fewer than 5% of visual tokens, thereby drastically reducing computational overhead and validating the necessity of dynamic token selection for practical clinical deployment. Our code will be released.
Problem

Research questions and friction points this paper is trying to address.

3D medical image understanding
computational inefficiency
anatomical redundancy
heterogeneous information density
token pruning
Innovation

Methods, ideas, or system contributions that make the work stand out.

token pruning
3D medical imaging
vision-language models
training-free
dynamic attention