Post-Transfer Learning Statistical Inference in High-Dimensional Regression

📅 2025-04-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In transfer learning for high-dimensional regression under few-shot target tasks (TL-HDR), existing feature selection methods lack statistically rigorous significance assessment. Method: We propose PTL-SI—the first falsifiable post-selection inference framework for transfer learning—integrating bias-corrected estimation, conditional inference, and block-wise sampling to enable exact p-value computation for high-dimensional features after transfer. Contribution/Results: PTL-SI theoretically guarantees strict false positive rate (FPR) control at any pre-specified level (e.g., α = 0.05). Its novel “divide-and-conquer” strategy simultaneously ensures FPR control and substantially improves statistical power. Experiments on synthetic and real high-dimensional datasets confirm that PTL-SI’s p-values are uniformly distributed under the null, FPR remains stable at the nominal level, and its power significantly surpasses state-of-the-art alternatives.

Technology Category

Application Category

📝 Abstract
Transfer learning (TL) for high-dimensional regression (HDR) is an important problem in machine learning, particularly when dealing with limited sample size in the target task. However, there currently lacks a method to quantify the statistical significance of the relationship between features and the response in TL-HDR settings. In this paper, we introduce a novel statistical inference framework for assessing the reliability of feature selection in TL-HDR, called PTL-SI (Post-TL Statistical Inference). The core contribution of PTL-SI is its ability to provide valid $p$-values to features selected in TL-HDR, thereby rigorously controlling the false positive rate (FPR) at desired significance level $alpha$ (e.g., 0.05). Furthermore, we enhance statistical power by incorporating a strategic divide-and-conquer approach into our framework. We demonstrate the validity and effectiveness of the proposed PTL-SI through extensive experiments on both synthetic and real-world high-dimensional datasets, confirming its theoretical properties and utility in testing the reliability of feature selection in TL scenarios.
Problem

Research questions and friction points this paper is trying to address.

Quantify statistical significance in transfer learning for high-dimensional regression
Provide valid p-values for feature selection in TL-HDR settings
Control false positive rate and enhance statistical power
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces PTL-SI for statistical inference in TL-HDR
Provides valid p-values for feature selection reliability
Enhances power via divide-and-conquer strategy
N
Nguyen Vu Khai Tam
University of Information Technology, Ho Chi Minh City, Vietnam. Vietnam National University, Ho Chi Minh City, Vietnam.
C
Cao Huyen My
University of Information Technology, Ho Chi Minh City, Vietnam. Vietnam National University, Ho Chi Minh City, Vietnam.
Vo Nguyen Le Duy
Vo Nguyen Le Duy
Lecturer at University of Information Technology / Visiting Scientist at RIKEN
Machine LearningData ScienceStatistics