Reluctant Interaction Inference after Additive Modeling

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

A fundamental modeling decision—whether to augment a fitted additive model with interaction terms—lacks a statistically rigorous framework for hypothesis testing. Method: We propose the first selective inference–based statistical test for this problem, using the sparse additive model (SPAM) as the null hypothesis. We formalize the “interaction hesitation” principle as a data-adaptive selective hypothesis test and introduce external randomization to correct overfitting bias, enabling rigorous construction of conditional p-values that provably control Type I error. Contribution/Results: Our method is validated on synthetic and real-world datasets: with minimal randomization, it significantly outperforms naive tests and data-splitting approaches—achieving strict false rejection rate control while markedly improving statistical power. This work establishes a new paradigm for interaction assessment in interpretable modeling: statistically valid, computationally feasible, and grounded in selective inference theory.

Technology Category

Application Category

📝 Abstract

Additive models enjoy the flexibility of nonlinear models while still being readily understandable to humans. By contrast, other nonlinear models, which involve interactions between features, are not only harder to fit but also substantially more complicated to explain. Guided by the principle of parsimony, a data analyst therefore may naturally be reluctant to move beyond an additive model unless it is truly warranted. To put this principle of interaction reluctance into practice, we formulate the problem as a hypothesis test with a fitted sparse additive model (SPAM) serving as the null. Because our hypotheses on interaction effects are formed after fitting a SPAM to the data, we adopt a selective inference approach to construct p-values that properly account for this data adaptivity. Our approach makes use of external randomization to obtain the distribution of test statistics conditional on the SPAM fit, allowing us to derive valid p-values, corrected for the over-optimism introduced by the data-adaptive process prior to the test. Through experiments on simulated and real data, we illustrate that--even with small amounts of external randomization--this rigorous modeling approach enjoys considerable advantages over naive methods and data splitting.

Problem

Research questions and friction points this paper is trying to address.

Testing interaction effects after fitting additive models

Addressing over-optimism in data-adaptive hypothesis testing

Validating interactions without compromising model interpretability

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses sparse additive model (SPAM) as null hypothesis

Applies selective inference for valid p-values

Employs external randomization for test statistics

🔎 Similar Papers

Error-controlled non-additive interaction discovery in machine learning models