Finding Distributions that Differ, with False Discovery Rate Control

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the problem of identifying distributional discrepancies across multiple groups relative to a single reference distribution. We propose a nonparametric multiple testing method with strict false discovery rate (FDR) control. Our method’s key contributions are: (1) introducing batch-wise conformal p-values and proving, for the first time, that they satisfy positive regression dependency on a subset (PRDS) under exchangeability—thereby guaranteeing exact FDR control by the Benjamini–Hochberg procedure; and (2) developing a rank-vector-based inductive construction technique to enhance computational stability and statistical power. The method imposes no distributional assumptions. In simulations, it matches the performance of parametric methods assuming known normality and outperforms existing conformal anomaly detection approaches. We demonstrate its practical utility and robustness through two real-world applications: identifying strong responders in hepatitis C treatment data and detecting subpopulations with exceptionally long working hours in U.S. Census data.

Technology Category

Application Category

📝 Abstract
We consider the problem of comparing a reference distribution with several other distributions. Given a sample from both the reference and the comparison groups, we aim to identify the comparison groups whose distributions differ from that of the reference group. Viewing this as a multiple testing problem, we introduce a methodology that provides exact, distribution-free control of the false discovery rate. To do so, we introduce the concept of batch conformal p-values and demonstrate that they satisfy positive regression dependence across the groups [Benjamini and Yekutieli, 2001], thereby enabling control of the false discovery rate through the Benjamini-Hochberg procedure. The proof of positive regression dependence introduces a novel technique for the inductive construction of rank vectors with almost sure dominance under exchangeability. We evaluate the performance of the proposed procedure through simulations, where, despite being distribution-free, in some cases they show performance comparable to methods with knowledge of the data-generating normal distribution; and further have more power than direct approaches based on conformal out-of-distribution detection. Further, we illustrate our methods on a Hepatitis C treatment dataset, where they can identify patient groups with large treatment effects; and on the Current Population Survey dataset, where they can identify sub-population with long work hours.
Problem

Research questions and friction points this paper is trying to address.

Identify comparison groups differing from reference distribution
Control false discovery rate in multiple testing
Develop batch conformal p-values for distribution-free analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses batch conformal p-values for distribution comparison
Controls false discovery rate via Benjamini-Hochberg procedure
Introduces inductive rank vector construction technique
🔎 Similar Papers
No similar papers found.
Yonghoon Lee
Yonghoon Lee
University of Pennsylvania
Statistics
Edgar Dobriban
Edgar Dobriban
Statistics & Computer Science, University of Pennsylvania
StatisticsMachine LearningAI
E
E. T. Tchetgen
Department of Statistics and Data Science, The Wharton School, University of Pennsylvania