D3FL: Data Distribution and Detrending for Robust Federated Learning in Non-linear Time-series Data

📅 2025-07-15

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

In IoT scenarios, nonlinear and nonstationary time series (e.g., generalized extreme value, log-normal distributions) suffer from severe model bias and slow convergence in federated learning (FL) due to inter-device heterogeneity in trends and seasonality. Method: This paper is the first to systematically characterize the detrimental impact of nonlinear distributions on FL performance and proposes a novel federated forecasting framework integrating trend-removal preprocessing—via differencing and STL decomposition—with FedAvg, using LSTM as the base model. Contribution/Results: Evaluated on synthetic and real-world IoT datasets, the framework demonstrates that vanilla FL under nonlinear distributions degrades significantly compared to centralized training. Incorporating adaptive detrending accelerates convergence by up to 37%, reduces average prediction error by 21.5%, and substantially improves robustness and generalization across heterogeneous devices.

Technology Category

Application Category

📝 Abstract

With advancements in computing and communication technologies, the Internet of Things (IoT) has seen significant growth. IoT devices typically collect data from various sensors, such as temperature, humidity, and energy meters. Much of this data is temporal in nature. Traditionally, data from IoT devices is centralized for analysis, but this approach introduces delays and increased communication costs. Federated learning (FL) has emerged as an effective alternative, allowing for model training across distributed devices without the need to centralize data. In many applications, such as smart home energy and environmental monitoring, the data collected by IoT devices across different locations can exhibit significant variation in trends and seasonal patterns. Accurately forecasting such non-stationary, non-linear time-series data is crucial for applications like energy consumption estimation and weather forecasting. However, these data variations can severely impact prediction accuracy. The key contributions of this paper are: (1) Investigating how non-linear, non-stationary time-series data distributions, like generalized extreme value (gen-extreme) and log norm distributions, affect FL performance. (2) Analyzing how different detrending techniques for non-linear time-series data influence the forecasting model's performance in a FL setup. We generated several synthetic time-series datasets using non-linear data distributions and trained an LSTM-based forecasting model using both centralized and FL approaches. Additionally, we evaluated the impact of detrending on real-world datasets with non-linear time-series data distributions. Our experimental results show that: (1) FL performs worse than centralized approaches when dealing with non-linear data distributions. (2) The use of appropriate detrending techniques improves FL performance, reducing loss across different data distributions.

Problem

Research questions and friction points this paper is trying to address.

Investigates impact of non-linear time-series data distributions on federated learning performance

Analyzes effect of detrending techniques on forecasting models in federated learning

Evaluates federated learning vs centralized approaches for non-stationary time-series data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Investigates non-linear time-series data distributions impact

Analyzes detrending techniques for FL performance

Uses LSTM-based forecasting model evaluation

🔎 Similar Papers

No similar papers found.