π€ AI Summary
To address low localization accuracy and high computational overhead in NLOS-dominated complex indoor wireless environments (e.g., factories), this paper proposes a high-accuracy, lightweight localization method tailored for resource-constrained devices. We innovatively design a Sensor Snapshot Tokenization (SST) mechanism that preserves key variable characteristics of the Power Delay Profile (PDP) while enhancing multi-source correlation modeling. Furthermore, we introduce a lightweight Swish-Gated Linear Unit Transformer (L-SwiGLU), significantly reducing parameter count and computational complexity. Evaluated in a real-world factory environment with severe NLOS conditions, our method achieves a 90%-ile localization error of 0.355 mβimproving upon baseline methods by 8.51%. Remarkably, it attains a 46.13% error reduction using only 1/14.1 the parameters of the baseline, thereby jointly optimizing accuracy, robustness, and deployment efficiency.
π Abstract
Indoor localization in challenging non-line-of-sight (NLOS) environments often leads to mediocre accuracy with traditional approaches. Deep learning (DL) has been applied to tackle these challenges; however, many DL approaches overlook computational complexity, especially for floating-point operations (FLOPs), making them unsuitable for resource-limited devices. Transformer-based models have achieved remarkable success in natural language processing (NLP) and computer vision (CV) tasks, motivating their use in wireless applications. However, their use in indoor localization remains nascent, and directly applying Transformers for indoor localization can be both computationally intensive and exhibit limitations in accuracy. To address these challenges, in this work, we introduce a novel tokenization approach, referred to as Sensor Snapshot Tokenization (SST), which preserves variable-specific representations of power delay profile (PDP) and enhances attention mechanisms by effectively capturing multi-variate correlation. Complementing this, we propose a lightweight Swish-Gated Linear Unit-based Transformer (L-SwiGLU Transformer) model, designed to reduce computational complexity without compromising localization accuracy. Together, these contributions mitigate the computational burden and dependency on large datasets, making Transformer models more efficient and suitable for resource-constrained scenarios. The proposed tokenization method enables the Vanilla Transformer to achieve a 90th percentile positioning error of 0.388 m in a highly NLOS indoor factory, surpassing conventional tokenization methods. The L-SwiGLU ViT further reduces the error to 0.355 m, achieving an 8.51% improvement. Additionally, the proposed model outperforms a 14.1 times larger model with a 46.13% improvement, underscoring its computational efficiency.