Share Your Secrets for Privacy! Confidential Forecasting with Vertical Federated Learning

📅 2024-05-31

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

206K/year

🤖 AI Summary

To address weak privacy guarantees, severe overfitting on small samples, poor convergence under multi-party settings, and high hyperparameter-tuning complexity in vertical federated learning (VFL) for industrial time-series forecasting, this paper proposes STV—the first secret-sharing-based VFL framework for time-series prediction. STV introduces an *N*-party secure matrix multiplication and matrix inversion protocol, enabling direct parameter optimization while ensuring strong convergence and low tuning complexity. It integrates SARIMAX, autoregressive trees, and custom multi-party secure computation protocols—eliminating reliance on a trusted third party. Evaluated on six real-world datasets, STV matches centralized methods in accuracy and outperforms state-of-the-art diffusion models and LSTMs by 23.81%. Communication overhead analysis further provides theoretical foundations for practical deployment.

Technology Category

Application Category

📝 Abstract

Vertical federated learning (VFL) is a promising area for time series forecasting in industrial applications, such as predictive maintenance and machine control. Critical challenges to address in manufacturing include data privacy and over-fitting on small and noisy datasets during both training and inference. Additionally, to increase industry adaptability, such forecasting models must scale well with the number of parties while ensuring strong convergence and low-tuning complexity. We address those challenges and propose 'Secret-shared Time Series Forecasting with VFL' (STV), a novel framework that exhibits the following key features: i) a privacy-preserving algorithm for forecasting with SARIMAX and autoregressive trees on vertically partitioned data; ii) serverless forecasting using secret sharing and multi-party computation; iii) novel N-party algorithms for matrix multiplication and inverse operations for direct parameter optimization, giving strong convergence with minimal hyperparameter tuning complexity. We conduct evaluations on six representative datasets from public and industry-specific contexts. Our results demonstrate that STV's forecasting accuracy is comparable to those of centralized approaches. They also show that our direct optimization can outperform centralized methods, which include state-of-the-art diffusion models and long-short-term memory, by 23.81% on forecasting accuracy. We also conduct a scalability analysis by examining the communication costs of direct and iterative optimization to navigate the choice between the two. Code and appendix are available: https://github.com/adis98/STV

Problem

Research questions and friction points this paper is trying to address.

Ensuring data privacy in vertical federated learning for time series forecasting

Addressing over-fitting on small, noisy datasets during training and inference

Scaling forecasting models efficiently with multiple parties while maintaining convergence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Privacy-preserving SARIMAX and autoregressive trees

Decentralized forecasting with secret sharing

N-party matrix operations for exact optimization

🔎 Similar Papers

De-VertiFL: A Solution for Decentralized Vertical Federated Learning