SceneMixer: Exploring Convolutional Mixing Networks for Remote Sensing Scene Classification

📅 2025-12-07

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Remote sensing scene classification suffers from limited model generalizability due to substantial variations in image resolution, viewing angles, and background clutter. To address this, we propose a lightweight convolutional mixing network that alternately stacks multi-scale depthwise separable convolutions and pointwise convolutions, enabling joint modeling of spatially local details and channel-wise semantic information while significantly reducing parameter count and computational cost. This work pioneers the integration of convolutional mixing mechanisms into remote sensing, striking an optimal trade-off between efficiency and representational capacity. Extensive experiments on the AID and EuroSAT benchmarks demonstrate state-of-the-art performance among lightweight models: 74.7% overall accuracy (OA), 74.57% mean average precision (mAP), and 73.79 Kappa coefficient on AID; and 93.90% OA, 93.93% mAP, and 93.22 Kappa on EuroSAT—substantially outperforming comparable lightweight architectures.

Technology Category

Application Category

📝 Abstract

Remote sensing scene classification plays a key role in Earth observation by enabling the automatic identification of land use and land cover (LULC) patterns from aerial and satellite imagery. Despite recent progress with convolutional neural networks (CNNs) and vision transformers (ViTs), the task remains challenging due to variations in spatial resolution, viewpoint, orientation, and background conditions, which often reduce the generalization ability of existing models. To address these challenges, this paper proposes a lightweight architecture based on the convolutional mixer paradigm. The model alternates between spatial mixing through depthwise convolutions at multiple scales and channel mixing through pointwise operations, enabling efficient extraction of both local and contextual information while keeping the number of parameters and computations low. Extensive experiments were conducted on the AID and EuroSAT benchmarks. The proposed model achieved overall accuracy, average accuracy, and Kappa values of 74.7%, 74.57%, and 73.79 on the AID dataset, and 93.90%, 93.93%, and 93.22 on EuroSAT, respectively. These results demonstrate that the proposed approach provides a good balance between accuracy and efficiency compared with widely used CNN- and transformer-based models. Code will be publicly available on: https://github.com/mqalkhatib/SceneMixer

Problem

Research questions and friction points this paper is trying to address.

Develops a lightweight convolutional mixer for remote sensing scene classification

Addresses challenges in spatial resolution and viewpoint variations in imagery

Balances accuracy and efficiency compared to CNN and transformer models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight convolutional mixer architecture for remote sensing

Alternates spatial mixing via depthwise convolutions and channel mixing

Efficiently extracts local and contextual information with low parameters

🔎 Similar Papers

No similar papers found.