When Knockoffs fail: diagnosing and fixing non-exchangeability of Knockoffs

📅 2024-07-09
📈 Citations: 2
Influential: 1
📄 PDF
🤖 AI Summary
In high-dimensional statistical inference, the knockoff framework fails to control the false discovery rate (FDR) under strong variable dependencies due to breakdown of the conditional exchangeability property. Method: We propose a diagnostic and verifiable knockoff framework: (i) we systematically characterize the exchangeability collapse mechanism of classical knockoff generators under strong dependence; (ii) we design the first classifier-based two-sample test for diagnosing knockoff validity, enabling empirically verifiable assessment; and (iii) we introduce a novel knockoff construction paradigm driven by full-variable prediction, ensuring strict FDR control at level $q$. Contribution/Results: The method achieves both computational efficiency and theoretical robustness. Experiments on synthetic and semi-realistic neuroimaging data demonstrate substantial suppression of false positives and reliable recovery of the nominal FDR guarantee ($ ext{FDR} leq q$).

Technology Category

Application Category

📝 Abstract
Knockoffs are a popular statistical framework that addresses the challenging problem of conditional variable selection in high-dimensional settings with statistical control. Such statistical control is essential for the reliability of inference. However, knockoff guarantees rely on an exchangeability assumption that is difficult to test in practice, and there is little discussion in the literature on how to deal with unfulfilled hypotheses. This assumption is related to the ability to generate data similar to the observed data. To maintain reliable inference, we introduce a diagnostic tool based on Classifier Two-Sample Tests. Using simulations and real data, we show that violations of this assumption occur in common settings for classical knockoff generators, especially when the data have a strong dependence structure. As a consequence, knockoff-based inference suffers from a massive inflation of false positives. We show that the diagnostic tool correctly detects such behavior. We show that an alternative knockoff construction, based on constructing a predictor of each variable based on all others, solves the issue. We also propose a computationally-efficient variant of this algorithm and show empirically that this approach restores error control on simulated data and semi-simulated experiments based on neuroimaging data.
Problem

Research questions and friction points this paper is trying to address.

Diagnosing non-exchangeability in Knockoffs framework
Addressing false positives due to unfulfilled exchangeability
Proposing alternative knockoff construction for reliable inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diagnostic tool using Classifier Two-Sample Tests
Alternative knockoff construction via variable prediction
Computationally-efficient variant for error control
🔎 Similar Papers
No similar papers found.