🤖 AI Summary
This work addresses the challenge of limited labeled radio frequency data and high acquisition costs that hinder deep learning in wireless communications. To overcome this, the authors propose LLM-AUG, a novel framework that leverages the in-context learning capability of large language models (LLMs) for wireless data augmentation. By crafting structured prompts, the method generates synthetic samples directly in the embedding space without requiring a dedicated generative model. Evaluated on RadioML and IC datasets, LLM-AUG significantly improves robustness and data efficiency under low-data and distribution-shift conditions, outperforming diffusion-based baselines by 67.6% and 35.7%, respectively. Remarkably, it achieves near-full-dataset performance using only 15% of the labeled data and yields a 29.4% relative gain in low signal-to-noise ratio scenarios.
📝 Abstract
Data scarcity remains a fundamental bottleneck in applying deep learning to wireless communication problems, particularly in scenarios where collecting labeled Radio Frequency (RF) data is expensive, time-consuming, or operationally constrained. This paper proposes LLM-AUG, a data augmentation framework that leverages in-context learning in large language models (LLMs) to generate synthetic training samples directly in a learned embedding space. Unlike conventional generative approaches that require training task-specific models, LLM-AUG performs data generation through structured prompting, enabling rapid adaptation in low-shot regimes. We evaluate LLM-AUG on two representative tasks: modulation classification and interference classification using the RadioML 2016.10A dataset, and the Interference Classification (IC) dataset respectively. Results show that LLM-AUG consistently outperforms traditional augmentation and deep generative baselines across low-shot settings and reaches near oracle performance using only 15% labeled data. LLM-AUG further demonstrates improved robustness under distribution shifts, yielding a 29.4% relative gain over diffusion-based augmentation at a lower SNR value. On the RadioML and IC datasets, LLM-AUG yields a relative gain of 67.6% and 35.7% over the diffusion-based baseline. The t-SNE visualizations further validate that synthetic samples generated by better preserve class structure in the embedding space, leading to more consistent and informative augmentations. These results demonstrate that LLMs can serve as effective and practical data augmenters for wireless machine learning, enabling robust and data-efficient learning in evolving wireless environments.