π€ AI Summary
This work addresses the challenge that existing conditional independence tests often fail to maintain valid frequentist error control under small-sample regimes or model misspecification, leading to miscalibrated p-values and inflated false discovery rates. To remedy this, the authors propose ECCIT, a novel method that introduces an adversarial calibration mechanism for the first time. ECCIT leverages adversarial learning to detect calibration discrepancies and applies a data-driven, monotonic transformation to recalibrate p-values from any base testβsuch as the Generalized Covariance Measure (GCM) or the Holdout Randomization Test (HRT). The approach achieves both high accuracy in small samples and robustness in large samples, effectively controlling the false discovery rate while substantially improving statistical power across synthetic and real-world datasets.
π Abstract
Conditional independence tests (CIT) are widely used for causal discovery and feature selection. Even with false discovery rate (FDR) control procedures, they often fail to provide frequentist guarantees in practice. We highlight two common failure modes: (i) in small samples, asymptotic guarantees for many CITs can be inaccurate and even correctly specified models fail to estimate the noise levels and control the error, and (ii) when sample sizes are large but models are misspecified, unaccounted dependencies skew the test's behavior and fail to return uniform p-values under the null. We propose Empirically Calibrated Conditional Independence Tests (ECCIT), a method that measures and corrects for miscalibration. For a chosen base CIT (e.g., GCM, HRT), ECCIT optimizes an adversary that selects features and response functions to maximize a miscalibration metric. ECCIT then fits a monotone calibration map that adjusts the base-test p-values in proportion to the observed miscalibration. Across empirical benchmarks on synthetic and real data, ECCIT achieves valid FDR with higher power than existing calibration strategies while remaining test agnostic.