GRAM-MAMBA: Holistic Feature Alignment for Wireless Perception with Adaptive Low-Rank Compensation

📅 2025-07-18

📈 Citations: 0

✨ Influential: 0

career value

250K/year

🤖 AI Summary

To address key challenges in IoT multimodal fusion—namely, high model complexity, superficial inter-modal relationship modeling (relying solely on unidirectional alignment), and poor robustness under sensor missingness—this paper proposes GRAM-MAMBA, a lightweight, efficient, and robust framework. Methodologically, it introduces a pairwise modality alignment mechanism and an adaptive low-rank compensation strategy to overcome unidirectional alignment limitations; employs linear-complexity Mamba for temporal sensor data modeling; leverages an optimized GRAM matrix for cross-modal alignment; and incorporates a LoRA-based, incrementally fine-tunable low-rank compensation layer enabling post-training dynamic adaptation to missing modalities. Evaluated on SPAWC2021 and USC-HAD, GRAM-MAMBA achieves 24.5% improvement in localization accuracy and 23% gain in F1-score—reaching 93.55% F1—with only 0.2–0.3% parameter fine-tuning, significantly outperforming baselines.

Technology Category

Application Category

📝 Abstract

Multi-modal fusion is crucial for Internet of Things (IoT) perception, widely deployed in smart homes, intelligent transport, industrial automation, and healthcare. However, existing systems often face challenges: high model complexity hinders deployment in resource-constrained environments, unidirectional modal alignment neglects inter-modal relationships, and robustness suffers when sensor data is missing. These issues impede efficient and robust multimodal perception in real-world IoT settings. To overcome these limitations, we propose GRAM-MAMBA. This framework utilizes the linear-complexity Mamba model for efficient sensor time-series processing, combined with an optimized GRAM matrix strategy for pairwise alignment among modalities, addressing the shortcomings of traditional single-modality alignment. Inspired by Low-Rank Adaptation (LoRA), we introduce an adaptive low-rank layer compensation strategy to handle missing modalities post-training. This strategy freezes the pre-trained model core and irrelevant adaptive layers, fine-tuning only those related to available modalities and the fusion process. Extensive experiments validate GRAM-MAMBA's effectiveness. On the SPAWC2021 indoor positioning dataset, the pre-trained model shows lower error than baselines; adapting to missing modalities yields a 24.5% performance boost by training less than 0.2% of parameters. On the USC-HAD human activity recognition dataset, it achieves 93.55% F1 and 93.81% Overall Accuracy (OA), outperforming prior work; the update strategy increases F1 by 23% while training less than 0.3% of parameters. These results highlight GRAM-MAMBA's potential for achieving efficient and robust multimodal perception in resource-constrained environments.

Problem

Research questions and friction points this paper is trying to address.

High model complexity limits IoT deployment in resource-constrained settings

Unidirectional modal alignment ignores inter-modal relationships in fusion

Missing sensor data reduces robustness in multimodal perception

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Mamba model for efficient time-series processing

Implements GRAM matrix for pairwise modal alignment

Adaptive low-rank layer handles missing modalities

🔎 Similar Papers

Robust Long-Range Perception Against Sensor Misalignment in Autonomous Vehicles