LWM-CDE: A Representation Space for Wireless Data Reasoning and Transferability

📅 2026-05-22

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the challenges of generalization and transferability of machine learning models in wireless communication, which are hindered by environmental heterogeneity, data diversity, and scarcity of real-world labeled data, leading to unreliable deployment decisions. To overcome these limitations, the authors propose the LWM-CDE framework, which establishes the first structured embedding space in the wireless domain. Built upon a pretrained wireless foundation model, the framework fine-tunes dataset embeddings using a combination of contrastive loss and geometric shaping loss, ensuring that embedding distances accurately reflect actual transfer performance. This approach enables efficient source dataset selection, label-aware augmentation, and budget-aware pretraining. Evaluated across multiple wireless benchmarks, LWM-CDE significantly outperforms existing metrics in correlation with true transfer performance while incurring lower computational overhead, thereby enhancing model deployment efficiency and lifecycle management.

📝 Abstract

Machine learning deployments in real-world wireless communication tasks face significant generalization challenges due to location and environment-specific signal structure, high diversity in data across different deployments, and limited availability of real-world data. Current approaches for assessing data similarity between training and inference (deployment) distributions, as well as evaluating model transferability, suffer from high computational costs and inconsistent performance, leaving critical model deployment and model life cycle management decisions without a principled foundation. To address this, we introduce a dataset similarity framework built upon the feature space of a pretrained wireless foundation model. Our method, LWM-CDE (Contrastive learning of Dataset Embedding), fine-tunes the dataset embeddings of the foundation model using a combination of contrastive and geometry-shaping losses, creating a structured manifold where distance reliably indicates transferability. Extensive experiments on wireless benchmarks show that LWM-CDE achieves stronger correlation with empirical transfer performance than existing metrics while being more computationally efficient. The learned representation space supports more effective and data-efficient decision-making for tasks like source dataset selection, label-aware augmentation, and budgeted pretraining, demonstrating its broader utility across different wireless communication applications.

Problem

Research questions and friction points this paper is trying to address.

wireless communication

model transferability

dataset similarity

generalization challenge

data scarcity

Innovation

Methods, ideas, or system contributions that make the work stand out.

dataset embedding

contrastive learning

transferability