Unified Distributed Estimation Framework for Sufficient Dimension Reduction Based on Conditional Moments

📅 2025-09-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Sufficient dimension reduction (SDR) faces significant challenges in distributed high-dimensional big data settings due to massive sample sizes, high dimensionality, and strong node heterogeneity. Method: This paper proposes the first unified distributed estimation framework for SDR based on conditional moments. It achieves exact distributed sliced inverse regression (SIR) estimation by combining local conditional moment modeling with global consistency constraints—naturally accommodating heterogeneous data structures—and employs a low-communication-cost distributed iterative optimization strategy to substantially reduce both computational and communication overhead. Contribution/Results: Theoretically, the global estimator is proven to be √n-consistent and asymptotically normal. Empirically, the method matches the accuracy of centralized SIR on both synthetic and real-world datasets, while demonstrating strong robustness under node failures or increased heterogeneity.

Technology Category

Application Category

📝 Abstract
Nowadays, massive datasets are typically dispersed across multiple locations, encountering dual challenges of high dimensionality and huge sample size. Therefore, it is necessary to explore sufficient dimension reduction (SDR) methods for distributed data. In this paper, we first propose an exact distributed estimation of sliced inverse regression, which substantially improves computational efficiency while obtaining identical estimation as that on the full sample. Then, we propose a unified distributed framework for general conditional-moment-based inverse regression methods. This framework allows for distinct population structure for data distributed at different locations, thus addressing the issue of heterogeneity. To assess the effectiveness of our proposed methods, we conduct simulations incorporating various data generation mechanisms, and examine scenarios where samples are homogeneous equally, heterogeneous equally, and heterogeneous unequally scattered across local nodes. Our findings highlight the versatility and applicability of the unified framework. Meanwhile, the communication cost is practically acceptable and the computation cost is greatly reduced. Sensitivity analysis verifies the robustness of the algorithm under extreme conditions where the SDR method locally fails on some nodes. A real data analysis also demonstrates the superior performance of the algorithm.
Problem

Research questions and friction points this paper is trying to address.

Distributed estimation for sufficient dimension reduction
Handling high-dimensional heterogeneous data across locations
Improving computational efficiency while maintaining estimation accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributed sliced inverse regression estimation
Unified framework for conditional-moment methods
Handles heterogeneous data across distributed locations
🔎 Similar Papers
No similar papers found.
H
Hongying Li
Department of Statistics, The Ohio State University, Columbus, Ohio, United States of America
M
Minyi Zhu
School of Statistics and Mathematics, Central University of Finance and Economics, Beijing, China
Y
Yaqi Cao
School of Science, Minzu University of China, Beijing, China
Xinyi Xu
Xinyi Xu
Meta
data centric-machine learningfederated Learningmulti-agent systemscooperative game theory