DiffKnock: Diffusion-based Knockoff Statistics for Neural Networks Inference

📅 2025-10-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing knockoff methods for high-dimensional feature selection struggle to model complex dependencies and nonlinear associations among features, while also lacking finite-sample false discovery rate (FDR) control. To address these limitations, this paper introduces DiffKnockoff—the first framework to integrate diffusion models into knockoff variable construction. DiffKnockoff leverages the iterative denoising process of diffusion models to accurately capture higher-order dependencies in the original features. It further designs an antisymmetric importance statistic by combining neural network gradients with filtered test statistics, enabling high statistical power for detecting nonlinear associations while guaranteeing rigorous finite-sample FDR control. Experiments demonstrate that DiffKnockoff significantly outperforms autoencoder-based baselines on synthetic data. In application to single-cell RNA-seq analysis, it successfully identifies key regulatory genes in the NF-κB pathway, confirming its biological interpretability and practical utility.

Technology Category

Application Category

📝 Abstract
We introduce DiffKnock, a diffusion-based knockoff framework for high-dimensional feature selection with finite-sample false discovery rate (FDR) control. DiffKnock addresses two key limitations of existing knockoff methods: preserving complex feature dependencies and detecting non-linear associations. Our approach trains diffusion models to generate valid knockoffs and uses neural network--based gradient and filter statistics to construct antisymmetric feature importance measures. Through simulations, we showed that DiffKnock achieved higher power than autoencoder-based knockoffs while maintaining target FDR, indicating its superior performance in scenarios involving complex non-linear architectures. Applied to murine single-cell RNA-seq data of LPS-stimulated macrophages, DiffKnock identifies canonical NF-$κ$B target genes (Ccl3, Hmox1) and regulators (Fosb, Pdgfb). These results highlight that, by combining the flexibility of deep generative models with rigorous statistical guarantees, DiffKnock is a powerful and reliable tool for analyzing single-cell RNA-seq data, as well as high-dimensional and structured data in other domains.
Problem

Research questions and friction points this paper is trying to address.

Controls false discovery rate in high-dimensional feature selection
Preserves complex dependencies and detects nonlinear associations
Generates valid knockoffs using diffusion models for neural networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses diffusion models to generate valid knockoffs
Employs neural network-based statistics for feature importance
Combines deep generative models with statistical guarantees
🔎 Similar Papers
No similar papers found.