nnMobileNet++: Towards Efficient Hybrid Networks for Retinal Image Analysis

📅 2025-11-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Lightweight CNNs (e.g., MobileNet) struggle to model long-range dependencies and slender vascular structures in retinal image analysis. To address this, we propose nnMobileNet++, an efficient hybrid architecture that integrates dynamic serpentine convolution (to enhance boundary and vessel detail perception), stage-wise embedded Transformer modules (to capture global contextual information), and multi-scale feature fusion. Additionally, we introduce a self-supervised pretraining strategy specifically designed for retinal images to improve generalization. Evaluated on classification, lesion detection, and vessel segmentation tasks, nnMobileNet++ achieves state-of-the-art or competitive performance across multiple public benchmarks—including DRIVE, STARE, and IDRiD—significantly outperforming existing lightweight models. Crucially, it maintains a favorable trade-off between accuracy and efficiency: parameter count and computational overhead increase only moderately, ensuring both high diagnostic fidelity and clinical deployability on resource-constrained devices.

Technology Category

Application Category

📝 Abstract
Retinal imaging is a critical, non-invasive modality for the early detection and monitoring of ocular and systemic diseases. Deep learning, particularly convolutional neural networks (CNNs), has significant progress in automated retinal analysis, supporting tasks such as fundus image classification, lesion detection, and vessel segmentation. As a representative lightweight network, nnMobileNet has demonstrated strong performance across multiple retinal benchmarks while remaining computationally efficient. However, purely convolutional architectures inherently struggle to capture long-range dependencies and model the irregular lesions and elongated vascular patterns that characterize on retinal images, despite the critical importance of vascular features for reliable clinical diagnosis. To further advance this line of work and extend the original vision of nnMobileNet, we propose nnMobileNet++, a hybrid architecture that progressively bridges convolutional and transformer representations. The framework integrates three key components: (i) dynamic snake convolution for boundary-aware feature extraction, (ii) stage-specific transformer blocks introduced after the second down-sampling stage for global context modeling, and (iii) retinal image pretraining to improve generalization. Experiments on multiple public retinal datasets for classification, together with ablation studies, demonstrate that nnMobileNet++ achieves state-of-the-art or highly competitive accuracy while maintaining low computational cost, underscoring its potential as a lightweight yet effective framework for retinal image analysis.
Problem

Research questions and friction points this paper is trying to address.

Develop hybrid network for retinal image analysis
Address long-range dependency capture in retinal images
Maintain computational efficiency while improving accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid architecture combining convolutional and transformer representations
Dynamic snake convolution for boundary-aware feature extraction
Stage-specific transformer blocks for global context modeling
🔎 Similar Papers
No similar papers found.