Collaborative Inference for Sparse High-Dimensional Models with Non-Shared Data

📅 2025-04-28

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This paper addresses the statistical testing challenge arising when multiple data holders cannot share raw data, while facing high-dimensional nuisance parameters and linear constraints of diverging dimensionality. Method: We propose the Collaborative Score Test (CST), the first constraint-optimization-based collaborative testing framework for non-shared, high-dimensional settings—free from restrictions on the number of participants and possessing both oracle property and global statistical efficiency. CST integrates the Kiefer–Bahadur decomposition, aggregated local gradient estimation, and a two-stage partially penalized strategy, supported by rigorous asymptotic distribution theory. Contribution/Results: We establish that CST attains exact asymptotic normality under both the null hypothesis and local alternatives. Extensive numerical simulations and real-data analyses demonstrate its high power, robustness, and practical feasibility.

Technology Category

Application Category

📝 Abstract

In modern data analysis, statistical efficiency improvement is expected via effective collaboration among multiple data holders with non-shared data. In this article, we propose a collaborative score-type test (CST) for testing linear hypotheses, which accommodates potentially high-dimensional nuisance parameters and a diverging number of constraints and target parameters. Through a careful decomposition of the Kiefer-Bahadur representation for the traditional score statistic, we identify and approximate the key components using aggregated local gradient information from each data source. In addition, we employ a two-stage partial penalization strategy to shrink the approximation error and mitigate the bias from the high-dimensional nuisance parameters. {Unlike existing methods, the CST procedure involves constrained optimization under non-shared and high-dimensional data settings, which requires novel theoretical developments.} We derive the limiting distributions for the CST statistic under the null hypothesis and the local alternatives. Besides, the CST exhibits an oracle property and achieves the global statistical efficiency. Moreover, it relaxes the stringent restrictions on the number of data sources required in the current literature. Extensive numerical studies and a real example demonstrate the effectiveness and validity of our proposed method.

Problem

Research questions and friction points this paper is trying to address.

Testing linear hypotheses with high-dimensional nuisance parameters

Collaborative inference under non-shared data constraints

Achieving global statistical efficiency with sparse models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Collaborative score-type test for non-shared data

Two-stage partial penalization for bias reduction

Aggregated local gradients for high-dimensional parameters

🔎 Similar Papers

No similar papers found.