Scalable and Distributed Individualized Treatment Rules for Massive Datasets

📅 2025-11-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenge of balancing privacy preservation and statistical efficiency in constructing individualized treatment rules (ITRs) from multi-center medical data, this paper proposes a scalable distributed learning framework. Methodologically, we design a convex, smooth loss function via convolutional smoothing and integrate it with a weighted support vector machine optimized via coordinate gradient descent; collaboration across centers relies solely on shared summary statistics—eliminating raw data transmission. The algorithm enjoys linear convergence guarantees with a fixed number of communication rounds. Our key innovation is the first application of convolutional smoothing to distributed ITR estimation, which eliminates local estimation bias while preserving privacy. Experiments on multi-center ICU sepsis data demonstrate substantial improvements in decision accuracy and out-of-sample generalizability, alongside computational efficiency and clinical applicability.

Technology Category

Application Category

📝 Abstract
Synthesizing information from multiple data sources is crucial for constructing accurate individualized treatment rules (ITRs). However, privacy concerns often present significant barriers to the integrative analysis of such multi-source data. Classical meta-learning, which averages local estimates to derive the final ITR, is frequently suboptimal due to biases in these local estimates. To address these challenges, we propose a convolution-smoothed weighted support vector machine for learning the optimal ITR. The accompanying loss function is both convex and smooth, which allows us to develop an efficient multi-round distributed learning procedure for ITRs. Such distributed learning ensures optimal statistical performance with a fixed number of communication rounds, thereby minimizing coordination costs across data centers while preserving data privacy. Our method avoids pooling subject-level raw data and instead requires only sharing summary statistics. Additionally, we develop an efficient coordinate gradient descent algorithm, which guarantees at least linear convergence for the resulting optimization problem. Extensive simulations and an application to sepsis treatment across multiple intensive care units validate the effectiveness of the proposed method.
Problem

Research questions and friction points this paper is trying to address.

Addressing privacy barriers in multi-source data analysis for treatment rules
Overcoming suboptimal performance of classical meta-learning for ITRs
Developing distributed learning methods that preserve data privacy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Convolution-smoothed weighted SVM for optimal ITRs
Multi-round distributed learning with summary statistics
Coordinate gradient descent ensuring linear convergence
🔎 Similar Papers
No similar papers found.
Nan Qiao
Nan Qiao
Amazon
Semantic SegmentationRepresentation LearningActive Learning
W
Wangcheng Li
School of Statistics, Beijing Normal University, Beijing, China
J
Jingxiao Zhang
Center for Applied Statistics and School of Statistics, Renmin University of China, Beijing, China
C
Canyi Chen
Department of Biostatistics, University of Michigan, Ann Arbor, United States