Statistical Inference for Sequential Feature Selection after Domain Adaptation

📅 2025-01-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Sequential feature selection under domain adaptation (SeqFS-DA) lacks rigorous statistical control, making it difficult to guarantee the false positive rate (FPR ≤ α). Method: We propose the first significance testing framework for SeqFS-DA with theoretical guarantees, grounded in conditional inference. It integrates data perturbation, asymptotic normality analysis, and resampling-based *p*-value calibration, enabling unified inference across standard model selection criteria—including AIC, BIC, and adjusted *R*². Contribution/Results: We prove that our method achieves exact FPR control under high-dimensional, large-scale settings, with statistically higher power than existing baselines. Extensive experiments on synthetic and real-world datasets confirm tight FPR control and superior detection performance. To our knowledge, this is the first work to deliver reproducible, verifiable statistical reliability for SeqFS-DA results.

Technology Category

Application Category

📝 Abstract
In high-dimensional regression, feature selection methods, such as sequential feature selection (SeqFS), are commonly used to identify relevant features. When data is limited, domain adaptation (DA) becomes crucial for transferring knowledge from a related source domain to a target domain, improving generalization performance. Although SeqFS after DA is an important task in machine learning, none of the existing methods can guarantee the reliability of its results. In this paper, we propose a novel method for testing the features selected by SeqFS-DA. The main advantage of the proposed method is its capability to control the false positive rate (FPR) below a significance level $alpha$ (e.g., 0.05). Additionally, a strategic approach is introduced to enhance the statistical power of the test. Furthermore, we provide extensions of the proposed method to SeqFS with model selection criteria including AIC, BIC, and adjusted R-squared. Extensive experiments are conducted on both synthetic and real-world datasets to validate the theoretical results and demonstrate the proposed method's superior performance.
Problem

Research questions and friction points this paper is trying to address.

Domain Adaptation
Sequential Feature Selection
Statistical Control
Innovation

Methods, ideas, or system contributions that make the work stand out.

Feature Selection Validation
Domain Adaptation
Error Probability Control
🔎 Similar Papers
No similar papers found.
D
Duong Tan Loc
University of Information Technology, Ho Chi Minh City, Vietnam; Vietnam National University, Ho Chi Minh City, Vietnam
N
Nguyen Thang Loi
University of Information Technology, Ho Chi Minh City, Vietnam; Vietnam National University, Ho Chi Minh City, Vietnam
Vo Nguyen Le Duy
Vo Nguyen Le Duy
Lecturer at University of Information Technology / Visiting Scientist at RIKEN
Machine LearningData ScienceStatistics