Controlling False Discovery in Arbitrarily Structured Hypothesis Spaces via Reproducing Kernels

📅 2026-05-17

📈 Citations: 0

✨ Influential: 0

career value

252K/year

🤖 AI Summary

This work proposes a structured false discovery rate (FDR) control method based on reproducing kernel Hilbert spaces (RKHS) for large-scale hypothesis testing with arbitrary structures—such as spatial proximity, graph connectivity, or hierarchical relationships. By reformulating FDR control as a regularized learning problem in RKHS and designing kernels tailored to continuous domains, graphs, and hierarchical structures, the approach yields smooth decision rules capable of inference at unobserved locations. It is the first to achieve structure-aware FDR control within the RKHS framework, integrating likelihood-driven hyperparameter selection, sample-efficient experimental design, and theoretically guaranteed FDR control. Applied to differential expression analysis in spatial transcriptomics and protein–protein interaction networks, the method substantially improves statistical power while rigorously maintaining FDR control.

📝 Abstract

Large-scale hypothesis testing is central to modern science, where controlling the False Discovery Rate (FDR) has become the standard approach to managing false positives across many simultaneous tests. Hypotheses rarely exist in isolation; they often exhibit structure through proximity, connectivity, or hierarchy. This structure represents both a challenge and an opportunity: while classical methods treat these dependencies as obstacles requiring conservative correction, leveraging them can substantially increase discovery power. Here, we reframe structured FDR control as a regularized learning problem. By optimizing within a suitable Reproducing Kernel Hilbert Space (RKHS), we introduce a framework that unifies continuous domains, graphs, and hierarchies under a single algorithm through kernel choice alone. This formulation enables smooth solutions in place of the piecewise-constant fits of prior methods, principled likelihood-based hyperparameter selection rather than heuristic tuning, and inference at unobserved locations which in turn supports sample-efficient experimental design. Building on this estimator, we provide two decision rules which we prove to control the FDR. We validate our method on two sources: spatial locations derived from high-dimensional real-world datasets, and a differential gene expression task utilizing protein-protein interaction graphs.

Problem

Research questions and friction points this paper is trying to address.

False Discovery Rate

Structured Hypothesis Testing

Reproducing Kernel Hilbert Space

Multiple Testing

Hypothesis Dependence

Innovation

Methods, ideas, or system contributions that make the work stand out.

False Discovery Rate

Reproducing Kernel Hilbert Space

Structured Hypothesis Testing