DCMM-Transformer: Degree-Corrected Mixed-Membership Attention for Medical Imaging

📅 2025-11-15

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

Standard vision Transformers struggle to model hierarchical anatomical community structures—such as organs, tissues, and lesions—in medical images. To address this, we propose DCMM-Transformer, the first framework to embed a differentiable Degree-Corrected Mixed-Membership (DCMM) model into the self-attention mechanism as a structural-aware additive bias, replacing non-differentiable binary masks. This enables end-to-end modeling of complex anatomical communities while enhancing interpretability. The design ensures training stability, yields anatomically consistent attention maps, and supports cross-modal generalization. Evaluated on multi-center datasets spanning brain, chest, breast, and ocular imaging, DCMM-Transformer significantly outperforms state-of-the-art methods in both accuracy and clinical interpretability, bridging the gap between high performance and anatomical plausibility.

Technology Category

Application Category

📝 Abstract

Medical images exhibit latent anatomical groupings, such as organs, tissues, and pathological regions, that standard Vision Transformers (ViTs) fail to exploit. While recent work like SBM-Transformer attempts to incorporate such structures through stochastic binary masking, they suffer from non-differentiability, training instability, and the inability to model complex community structure. We present DCMM-Transformer, a novel ViT architecture for medical image analysis that incorporates a Degree-Corrected Mixed-Membership (DCMM) model as an additive bias in self-attention. Unlike prior approaches that rely on multiplicative masking and binary sampling, our method introduces community structure and degree heterogeneity in a fully differentiable and interpretable manner. Comprehensive experiments across diverse medical imaging datasets, including brain, chest, breast, and ocular modalities, demonstrate the superior performance and generalizability of the proposed approach. Furthermore, the learned group structure and structured attention modulation substantially enhance interpretability by yielding attention maps that are anatomically meaningful and semantically coherent.

Problem

Research questions and friction points this paper is trying to address.

Medical images contain latent anatomical structures ViTs ignore

Existing methods suffer from non-differentiability and training instability

Current approaches cannot model complex community structures effectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Degree-Corrected Mixed-Membership model as attention bias

Differentiable community structure modeling in Vision Transformers

Interpretable anatomical grouping through structured attention modulation

🔎 Similar Papers

No similar papers found.