Dynamical Multimodal Fusion with Mixture-of-Experts for Localizations

📅 2025-07-01

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

To address two critical bottlenecks in 6G integrated sensing and communication (ISAC) multi-modal fingerprinting localization—frequency-dependent modal contribution drift and spatial/fingerprint ambiguity-induced accuracy degradation under dynamic spectrum and non-line-of-sight (NLOS) conditions—this paper proposes a spatial-context-aware dynamic fusion network. We introduce the first large-scale multi-modal mixture-of-experts (MoE) system, featuring a novel modality-task dual-level architecture and a learnable routing mechanism. The framework integrates trajectory-clustering-based representation learning, multi-task coordinate regression, and maximum mean discrepancy regularization to jointly optimize frequency adaptability and expert diversity. Evaluated across three real-world urban environments and three carrier frequencies (2.6/6/28 GHz), the method achieves stable sub-meter mean squared error. For unseen NLOS scenarios, it reduces localization error by 50% over state-of-the-art methods.

Technology Category

Application Category

📝 Abstract

Multimodal fingerprinting is a crucial technique to sub-meter 6G integrated sensing and communications (ISAC) localization, but two hurdles block deployment: (i) the contribution each modality makes to the target position varies with the operating conditions such as carrier frequency, and (ii) spatial and fingerprint ambiguities markedly undermine localization accuracy, especially in non-line-of-sight (NLOS) scenarios. To solve these problems, we introduce SCADF-MoE, a spatial-context aware dynamic fusion network built on a soft mixture-of-experts backbone. SCADF-MoE first clusters neighboring points into short trajectories to inject explicit spatial context. Then, it adaptively fuses channel state information, angle of arrival profile, distance, and gain through its learnable MoE router, so that the most reliable cues dominate at each carrier band. The fused representation is fed to a modality-task MoE that simultaneously regresses the coordinates of every vertex in the trajectory and its centroid, thereby exploiting inter-point correlations. Finally, an auxiliary maximum-mean-discrepancy loss enforces expert diversity and mitigates gradient interference, stabilizing multi-task training. On three real urban layouts and three carrier bands (2.6, 6, 28 GHz), the model delivers consistent sub-meter MSE and halves unseen-NLOS error versus the best prior work. To our knowledge, this is the first work that leverages large-scale multimodal MoE for frequency-robust ISAC localization.

Problem

Research questions and friction points this paper is trying to address.

Dynamic fusion of varying multimodal contributions for 6G localization

Reducing spatial and fingerprint ambiguities in NLOS scenarios

Achieving frequency-robust sub-meter accuracy in ISAC systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Spatial-context aware dynamic fusion network

Learnable MoE router for adaptive fusion

Modality-task MoE exploiting inter-point correlations

🔎 Similar Papers

Completed Feature Disentanglement Learning for Multimodal MRIs Analysis