🤖 AI Summary
This work addresses the challenges of low inference efficiency and weak feature representation in edge-based multispectral fusion object detection under high-resolution inputs, as well as the parameter redundancy and loss of fine-grained structural information in existing state space models (SSMs) after compression. To overcome these limitations, the authors propose a Low-Rank Selective State Space Model in 2D (Low-Rank SS2D), which leverages low-rank matrix decomposition to reformulate the state transition mechanism and exploit intrinsic feature sparsity. Additionally, a structure-aware knowledge distillation strategy is introduced to align the hidden state dynamics between teacher and student models. The proposed method achieves significant model compression while preserving high-fidelity spatial modeling capabilities, outperforming existing lightweight architectures across five benchmark datasets and real-world edge platforms such as the Raspberry Pi 5, thereby offering a superior trade-off between efficiency and accuracy.
📝 Abstract
Multispectral fusion object detection is a critical task for edge-based maritime surveillance and remote sensing, demanding both high inference efficiency and robust feature representation for high-resolution inputs. However, current State Space Models (SSMs) like Mamba suffer from significant parameter redundancy in their standard 2D Selective Scan (SS2D) blocks, which hinders deployment on resource-constrained hardware and leads to the loss of fine-grained structural information during conventional compression. To address these challenges, we propose the Low-Rank Two-Dimensional Selective Structured State Space Model (Low-Rank SS2D), which reformulates state transitions via matrix factorization to exploit intrinsic feature sparsity. Furthermore, we introduce a Structure-Aware Distillation strategy that aligns the internal latent state dynamics of the student with a full-rank teacher model to compensate for potential representation degradation. This approach substantially reduces computational complexity and memory footprint while preserving the high-fidelity spatial modeling required for object recognition. Extensive experiments on five benchmark datasets and real-world edge platforms, such as Raspberry Pi 5, demonstrate that our method achieves a superior efficiency-accuracy trade-off, significantly outperforming existing lightweight architectures in practical deployment scenarios.