Small Lesions-aware Bidirectional Multimodal Multiscale Fusion Network for Lung Disease Classification

📅 2025-08-06

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses two key challenges in pulmonary disease diagnosis: (1) high risk of missing small lesions, and (2) severe dimensionality mismatch and semantic misalignment between 3D PET-CT volumes and electronic health records (EHR). To tackle these, we propose a Bidirectional Multimodal Multiscale Fusion Network (Bi-MMMSF). Bi-MMMSF constructs an image feature pyramid for multiscale representation, incorporates a novel 3D multiscale convolutional attention module to enhance discriminative feature learning from subtle lesions, and introduces a multiscale cross-attention mechanism to enable fine-grained, hierarchical alignment and complementary fusion between imaging and EHR modalities. Extensive experiments on the Lung-PET-CT-Dx dataset demonstrate that Bi-MMMSF achieves a +2.3% improvement in classification accuracy over state-of-the-art methods, validating its effectiveness and superiority in detecting small lesions and achieving robust, semantics-aware cross-modal integration.

Technology Category

Application Category

📝 Abstract

The diagnosis of medical diseases faces challenges such as the misdiagnosis of small lesions. Deep learning, particularly multimodal approaches, has shown great potential in the field of medical disease diagnosis. However, the differences in dimensionality between medical imaging and electronic health record data present challenges for effective alignment and fusion. To address these issues, we propose the Multimodal Multiscale Cross-Attention Fusion Network (MMCAF-Net). This model employs a feature pyramid structure combined with an efficient 3D multi-scale convolutional attention module to extract lesion-specific features from 3D medical images. To further enhance multimodal data integration, MMCAF-Net incorporates a multi-scale cross-attention module, which resolves dimensional inconsistencies, enabling more effective feature fusion. We evaluated MMCAF-Net on the Lung-PET-CT-Dx dataset, and the results showed a significant improvement in diagnostic accuracy, surpassing current state-of-the-art methods. The code is available at https://github.com/yjx1234/MMCAF-Net

Problem

Research questions and friction points this paper is trying to address.

Addresses misdiagnosis of small lung lesions in medical imaging

Resolves dimensional inconsistencies in multimodal medical data fusion

Improves accuracy of lung disease classification using deep learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D multi-scale convolutional attention module

Multi-scale cross-attention fusion

Feature pyramid structure integration

🔎 Similar Papers

Developing a Dual-Stage Vision Transformer Model for Lung Disease Classification