Scalable and Distributed Individualized Treatment Rules for Massive Datasets

📅 2025-11-08

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Addressing the challenge of balancing privacy preservation and statistical efficiency in constructing individualized treatment rules (ITRs) from multi-center medical data, this paper proposes a scalable distributed learning framework. Methodologically, we design a convex, smooth loss function via convolutional smoothing and integrate it with a weighted support vector machine optimized via coordinate gradient descent; collaboration across centers relies solely on shared summary statistics—eliminating raw data transmission. The algorithm enjoys linear convergence guarantees with a fixed number of communication rounds. Our key innovation is the first application of convolutional smoothing to distributed ITR estimation, which eliminates local estimation bias while preserving privacy. Experiments on multi-center ICU sepsis data demonstrate substantial improvements in decision accuracy and out-of-sample generalizability, alongside computational efficiency and clinical applicability.

Technology Category

Application Category

📝 Abstract

Synthesizing information from multiple data sources is crucial for constructing accurate individualized treatment rules (ITRs). However, privacy concerns often present significant barriers to the integrative analysis of such multi-source data. Classical meta-learning, which averages local estimates to derive the final ITR, is frequently suboptimal due to biases in these local estimates. To address these challenges, we propose a convolution-smoothed weighted support vector machine for learning the optimal ITR. The accompanying loss function is both convex and smooth, which allows us to develop an efficient multi-round distributed learning procedure for ITRs. Such distributed learning ensures optimal statistical performance with a fixed number of communication rounds, thereby minimizing coordination costs across data centers while preserving data privacy. Our method avoids pooling subject-level raw data and instead requires only sharing summary statistics. Additionally, we develop an efficient coordinate gradient descent algorithm, which guarantees at least linear convergence for the resulting optimization problem. Extensive simulations and an application to sepsis treatment across multiple intensive care units validate the effectiveness of the proposed method.

Problem

Research questions and friction points this paper is trying to address.

Addressing privacy barriers in multi-source data analysis for treatment rules

Overcoming suboptimal performance of classical meta-learning for ITRs

Developing distributed learning methods that preserve data privacy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Convolution-smoothed weighted SVM for optimal ITRs

Multi-round distributed learning with summary statistics

Coordinate gradient descent ensuring linear convergence

🔎 Similar Papers

A Comprehensive Survey on Retrieval Methods in Recommender Systems

2024-07-11arXiv.orgCitations: 15

💼 Related Jobs

Research Engineer, Monetization AI