Valid F-screening in linear regression

📅 2025-05-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the “F-screening” problem in linear regression—where coefficient inference is reported only if the global F-test is significant—rendering standard p-values and confidence intervals invalid under the conditional event (F-rejection). We propose the first data-free method for conditionally valid inference, requiring only conventional regression output (e.g., coefficients, standard errors, F-statistic). Our approach introduces analytically tractable conditional p-values, bias-corrected point estimates, and adjusted confidence intervals derived from the F-screening selection mechanism, unifying selective inference with conditional hypothesis testing. The method enables retrospective analysis without sample splitting, rigorously controls selective Type I error, and achieves nominal coverage. Empirical evaluations demonstrate substantially higher statistical power than sample-splitting alternatives and successfully replicate findings from two biomedical studies.

Technology Category

Application Category

📝 Abstract
Suppose that a data analyst wishes to report the results of a least squares linear regression only if the overall null hypothesis, $H_0^{1:p}: eta_1= eta_2 = ldots = eta_p=0$, is rejected. This practice, which we refer to as F-screening (since the overall null hypothesis is typically tested using an $F$-statistic), is in fact common practice across a number of applied fields. Unfortunately, it poses a problem: standard guarantees for the inferential outputs of linear regression, such as Type 1 error control of hypothesis tests and nominal coverage of confidence intervals, hold unconditionally, but fail to hold conditional on rejection of the overall null hypothesis. In this paper, we develop an inferential toolbox for the coefficients in a least squares model that are valid conditional on rejection of the overall null hypothesis. We develop selective p-values that lead to tests that control the selective Type 1 error, i.e., the Type 1 error conditional on having rejected the overall null hypothesis. Furthermore, they can be computed without access to the raw data, i.e., using only the standard outputs of a least squares linear regression, and therefore are suitable for use in a retrospective analysis of a published study. We also develop confidence intervals that attain nominal selective coverage, and point estimates that account for having rejected the overall null hypothesis. We show empirically that our selective procedure is preferable to an alternative approach that relies on sample splitting, and we demonstrate its performance via re-analysis of two datasets from the biomedical literature.
Problem

Research questions and friction points this paper is trying to address.

Ensures valid inference after rejecting overall null hypothesis
Controls selective Type 1 error in post-screening tests
Provides valid confidence intervals conditional on F-screening
Innovation

Methods, ideas, or system contributions that make the work stand out.

Selective p-values for conditional error control
Confidence intervals with nominal selective coverage
Point estimates adjusting for null hypothesis rejection
🔎 Similar Papers
No similar papers found.
O
Olivia McGough
Department of Statistics, University of Washington
Daniela Witten
Daniela Witten
Professor of Statistics & Biostatistics, Dorothy Gilford Endowed Chair, University of Washington
statisticsmachine learning
D
Daniel Kessler
Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, School of Data Science and Society, University of North Carolina at Chapel Hill