🤖 AI Summary
To address the underutilization of 3D volumetric information, scarcity of annotated data, and high model complexity in digital breast tomosynthesis (DBT), this paper proposes M&M-3D—a parameter-efficient architecture enabling zero-shot incremental transfer from 2D pre-trained weights (e.g., FFDM) to DBT. Without increasing model parameters, it introduces a learnable 3D reasoning mechanism that fuses malignancy-guided voxel-level features with slice-level responses. Its core innovation is a lightweight, feature-mixing-based 3D contextual modeling approach—avoiding explicit 3D convolutions or auxiliary supervision. On the BCS-DBT benchmark, M&M-3D achieves a 4% improvement in classification accuracy and a 10% gain in lesion localization performance. Under low-data regimes, it outperforms complex 3D models by 20–47% in localization and 2–10% in classification, significantly enhancing sample efficiency and generalizability for small-sample DBT cancer detection.
📝 Abstract
Digital Breast Tomosynthesis (DBT) enhances finding visibility for breast cancer detection by providing volumetric information that reduces the impact of overlapping tissues; however, limited annotated data has constrained the development of deep learning models for DBT. To address data scarcity, existing methods attempt to reuse 2D full-field digital mammography (FFDM) models by either flattening DBT volumes or processing slices individually, thus discarding volumetric information. Alternatively, 3D reasoning approaches introduce complex architectures that require more DBT training data. Tackling these drawbacks, we propose M&M-3D, an architecture that enables learnable 3D reasoning while remaining parameter-free relative to its FFDM counterpart, M&M. M&M-3D constructs malignancy-guided 3D features, and 3D reasoning is learned through repeatedly mixing these 3D features with slice-level information. This is achieved by modifying operations in M&M without adding parameters, thus enabling direct weight transfer from FFDM. Extensive experiments show that M&M-3D surpasses 2D projection and 3D slice-based methods by 11-54% for localization and 3-10% for classification. Additionally, M&M-3D outperforms complex 3D reasoning variants by 20-47% for localization and 2-10% for classification in the low-data regime, while matching their performance in high-data regime. On the popular BCS-DBT benchmark, M&M-3D outperforms previous top baseline by 4% for classification and 10% for localization.