🤖 AI Summary
Microscope image classification (MIC) faces challenges including the trade-off between global modeling capability and computational efficiency, loss of fine-grained pixel-level information, channel redundancy, and insufficient local perception. To address these, we propose MambaMIC—a lightweight, efficient vision backbone. It introduces a novel local–global dual-branch aggregation module that synergistically integrates local convolutional perception with selective state space modeling (SSM). Furthermore, we design a local-aware enhancement filter and a feature modulation interaction aggregation mechanism to mitigate pixel-level forgetting and channel redundancy. Evaluated on five standard MIC benchmarks, MambaMIC achieves state-of-the-art accuracy with significantly fewer parameters and lower FLOPs, while delivering substantial inference speedup. The architecture demonstrates superior representational capacity and strong deployment efficiency, making it particularly suitable for resource-constrained biomedical imaging applications.
📝 Abstract
In recent years, CNN and Transformer-based methods have made significant progress in Microscopic Image Classification (MIC). However, existing approaches still face the dilemma between global modeling and efficient computation. While the Selective State Space Model (SSM) can simulate long-range dependencies with linear complexity, it still encounters challenges in MIC, such as local pixel forgetting, channel redundancy, and lack of local perception. To address these issues, we propose a simple yet efficient vision backbone for MIC tasks, named MambaMIC. Specifically, we introduce a Local-Global dual-branch aggregation module: the MambaMIC Block, designed to effectively capture and fuse local connectivity and global dependencies. In the local branch, we use local convolutions to capture pixel similarity, mitigating local pixel forgetting and enhancing perception. In the global branch, SSM extracts global dependencies, while Locally Aware Enhanced Filter reduces channel redundancy and local pixel forgetting. Additionally, we design a Feature Modulation Interaction Aggregation Module for deep feature interaction and key feature re-localization. Extensive benchmarking shows that MambaMIC achieves state-of-the-art performance across five datasets. code is available at https://zs1314.github.io/MambaMIC