SceneMixer: Exploring Convolutional Mixing Networks for Remote Sensing Scene Classification

📅 2025-12-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Remote sensing scene classification suffers from limited model generalizability due to substantial variations in image resolution, viewing angles, and background clutter. To address this, we propose a lightweight convolutional mixing network that alternately stacks multi-scale depthwise separable convolutions and pointwise convolutions, enabling joint modeling of spatially local details and channel-wise semantic information while significantly reducing parameter count and computational cost. This work pioneers the integration of convolutional mixing mechanisms into remote sensing, striking an optimal trade-off between efficiency and representational capacity. Extensive experiments on the AID and EuroSAT benchmarks demonstrate state-of-the-art performance among lightweight models: 74.7% overall accuracy (OA), 74.57% mean average precision (mAP), and 73.79 Kappa coefficient on AID; and 93.90% OA, 93.93% mAP, and 93.22 Kappa on EuroSAT—substantially outperforming comparable lightweight architectures.

Technology Category

Application Category

📝 Abstract
Remote sensing scene classification plays a key role in Earth observation by enabling the automatic identification of land use and land cover (LULC) patterns from aerial and satellite imagery. Despite recent progress with convolutional neural networks (CNNs) and vision transformers (ViTs), the task remains challenging due to variations in spatial resolution, viewpoint, orientation, and background conditions, which often reduce the generalization ability of existing models. To address these challenges, this paper proposes a lightweight architecture based on the convolutional mixer paradigm. The model alternates between spatial mixing through depthwise convolutions at multiple scales and channel mixing through pointwise operations, enabling efficient extraction of both local and contextual information while keeping the number of parameters and computations low. Extensive experiments were conducted on the AID and EuroSAT benchmarks. The proposed model achieved overall accuracy, average accuracy, and Kappa values of 74.7%, 74.57%, and 73.79 on the AID dataset, and 93.90%, 93.93%, and 93.22 on EuroSAT, respectively. These results demonstrate that the proposed approach provides a good balance between accuracy and efficiency compared with widely used CNN- and transformer-based models. Code will be publicly available on: https://github.com/mqalkhatib/SceneMixer
Problem

Research questions and friction points this paper is trying to address.

Develops a lightweight convolutional mixer for remote sensing scene classification
Addresses challenges in spatial resolution and viewpoint variations in imagery
Balances accuracy and efficiency compared to CNN and transformer models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight convolutional mixer architecture for remote sensing
Alternates spatial mixing via depthwise convolutions and channel mixing
Efficiently extracts local and contextual information with low parameters
🔎 Similar Papers
No similar papers found.
M
Mohammed Q. Alkhatib
College of Engineering and IT, University of Dubai, Emirates Road - Exit 49, Dubai 14143, UAE
A
Ali Jamali
Department of Geography, Simon Fraser University, 8888 University Dr W, Burnaby, BC V5A 1S6, Canada
Swalpa Kumar Roy
Swalpa Kumar Roy
Alipurduar Govt. Engineering & Management College, West Bengal (IASc. Associate, INSA VSP)
Artificial IntelligenceEarth ObservationHSIMachine LearningOptimization