🤖 AI Summary
To address the limitations of CNNs in global modeling and Transformers in computational efficiency for multi-class object counting in remote sensing imagery, this work pioneers the adoption of the Mamba state space model (SSM) in this domain. We propose a cross-scale interaction module and a context-enhanced SSM mechanism to jointly capture long-range dependencies and fine-grained local details. By integrating scan-based sequential modeling with hierarchical feature fusion, our approach significantly improves both modeling efficiency and representational capacity. Evaluated on a large-scale real-world remote sensing dataset, the method achieves state-of-the-art performance: it reduces mean absolute error by 18.7% over leading CNN- and Transformer-based counting models and accelerates inference by 2.3×. These results demonstrate the effectiveness and scalability of state space models for high-resolution, large-scene remote sensing object counting.
📝 Abstract
Multicategory remote object counting is a fundamental task in computer vision, aimed at accurately estimating the number of objects of various categories in remote images. Existing methods rely on CNNs and Transformers, but CNNs struggle to capture global dependencies, and Transformers are computationally expensive, which limits their effectiveness in remote applications. Recently, Mamba has emerged as a promising solution in the field of computer vision, offering a linear complexity for modeling global dependencies. To this end, we propose Mamba-MOC, a mamba-based network designed for multi-category remote object counting, which represents the first application of Mamba to remote sensing object counting. Specifically, we propose a cross-scale interaction module to facilitate the deep integration of hierarchical features. Then we design a context state space model to capture both global and local contextual information and provide local neighborhood information during the scan process. Experimental results in large-scale realistic scenarios demonstrate that our proposed method achieves state-of-the-art performance compared with some mainstream counting algorithms.