Studying the Role of Synthetic Data for Machine Learning-based Wireless Networks Traffic Forecasting

📅 2026-01-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenges of high acquisition costs, privacy sensitivity, and limited scalability associated with real-world wireless network traffic data. To mitigate reliance on extensive real data, the authors propose a synthetic data generation method based on first-order autoregressive noise statistics, capable of producing realistic Wi-Fi access point (AP) traffic sequences with rich statistical properties using only minimal real data. Experimental results demonstrate that models trained on this synthetic data achieve a mean absolute error (MAE) merely 10–15% higher than those trained on real data in same-AP scenarios. Notably, in cross-AP generalization settings, the approach improves prediction accuracy by up to 50%, substantially enhancing model generalizability and practical applicability.

Technology Category

Application Category

📝 Abstract
Synthetic data generation is an appealing tool for augmenting and enriching datasets, playing a crucial role in advancing artificial intelligence (AI) and machine learning (ML). Not only does synthetic data help build robust AI/ML datasets cost-effectively, but it also offers privacy-friendly solutions and bypasses the complexities of storing large data volumes. This paper proposes a novel method to generate synthetic data, based on first-order auto-regressive noise statistics, for large-scale Wi-Fi deployments. The approach operates with minimal real data requirements while producing statistically rich traffic patterns that effectively mimic real Access Point (AP) behavior. Experimental results show that ML models trained on synthetic data achieve Mean Absolute Error (MAE) values within 10 to 15 of those obtained using real data when trained on the same APs, while requiring significantly less training data. Moreover, when generalization is required, synthetic-data-trained models improve prediction accuracy by up to 50 percent compared to real-data-trained baselines, thanks to the enhanced variability and diversity of the generated traces. Overall, the proposed method bridges the gap between synthetic data generation and practical Wi-Fi traffic forecasting, providing a scalable, efficient, and real-time solution for modern wireless networks.
Problem

Research questions and friction points this paper is trying to address.

synthetic data
traffic forecasting
wireless networks
machine learning
data augmentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

synthetic data generation
Wi-Fi traffic forecasting
autoregressive noise modeling
machine learning generalization
data augmentation
🔎 Similar Papers
No similar papers found.