Post-Transfer Learning Statistical Inference in High-Dimensional Regression

📅 2025-04-25

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

In transfer learning for high-dimensional regression under few-shot target tasks (TL-HDR), existing feature selection methods lack statistically rigorous significance assessment. Method: We propose PTL-SI—the first falsifiable post-selection inference framework for transfer learning—integrating bias-corrected estimation, conditional inference, and block-wise sampling to enable exact p-value computation for high-dimensional features after transfer. Contribution/Results: PTL-SI theoretically guarantees strict false positive rate (FPR) control at any pre-specified level (e.g., α = 0.05). Its novel “divide-and-conquer” strategy simultaneously ensures FPR control and substantially improves statistical power. Experiments on synthetic and real high-dimensional datasets confirm that PTL-SI’s p-values are uniformly distributed under the null, FPR remains stable at the nominal level, and its power significantly surpasses state-of-the-art alternatives.

Technology Category

Application Category

📝 Abstract

Transfer learning (TL) for high-dimensional regression (HDR) is an important problem in machine learning, particularly when dealing with limited sample size in the target task. However, there currently lacks a method to quantify the statistical significance of the relationship between features and the response in TL-HDR settings. In this paper, we introduce a novel statistical inference framework for assessing the reliability of feature selection in TL-HDR, called PTL-SI (Post-TL Statistical Inference). The core contribution of PTL-SI is its ability to provide valid $p$-values to features selected in TL-HDR, thereby rigorously controlling the false positive rate (FPR) at desired significance level $alpha$ (e.g., 0.05). Furthermore, we enhance statistical power by incorporating a strategic divide-and-conquer approach into our framework. We demonstrate the validity and effectiveness of the proposed PTL-SI through extensive experiments on both synthetic and real-world high-dimensional datasets, confirming its theoretical properties and utility in testing the reliability of feature selection in TL scenarios.

Problem

Research questions and friction points this paper is trying to address.

Quantify statistical significance in transfer learning for high-dimensional regression

Provide valid p-values for feature selection in TL-HDR settings

Control false positive rate and enhance statistical power

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces PTL-SI for statistical inference in TL-HDR

Provides valid p-values for feature selection reliability

Enhances power via divide-and-conquer strategy

🔎 Similar Papers

Transfer Learning in ℓ1 Regularized Regression: Hyperparameter Selection Strategy based on Sharp Asymptotic Analysis