🤖 AI Summary
This study addresses the challenge of controlling the false discovery rate (FDR) in high-throughput biological experiments, where sample sizes are small yet thousands of linear models must be fitted simultaneously—a setting in which conventional methods often fail. The authors demonstrate that limma-trend can be formally embedded within an empirical partially Bayes framework, revealing its underlying mechanism as an approximate partially Bayes p-value computation based on residual variances and unit-level statistics. This insight also clarifies the failure mode of MAnorm2. Building on this foundation, the paper proposes two nonparametric generalizations that asymptotically control FDR in signal-dense scenarios without requiring correct specification of the variance trend. These new methods substantially enhance the reliability and robustness of high-dimensional, small-sample biological data analysis.
📝 Abstract
In high-throughput biology, it is common to fit thousands of linear regressions -- one per gene, protein, or other unit -- with very few samples per unit. Limma-trend, one of the most widely used methods in this setting, improves power by shrinking variance estimates parametrically toward a fitted curve (the trend) relating variance to a unit-level summary (e.g., average intensity, peptide count), before computing p-values and applying the Benjamini-Hochberg procedure to control the false discovery rate (FDR). We study limma-trend through the lens of empirical partially Bayes inference, a paradigm in which a prior is posited and estimated for the nuisance parameters while parameters of interest remain fixed. From this perspective, limma-trend computes approximate partially Bayes p-values that condition on the residual sample variance and the unit-level summary. The same framework explains why MAnorm2, a popular variant for ChIP-seq, can sometimes fail to control FDR. We then derive a nonparametric generalization of limma-trend that estimates the residual variance prior using nonparametric maximum likelihood. Under dense signals, this procedure asymptotically controls the FDR -- even when the trend is misspecified or inconsistently estimated. To allow the full shape of the conditional variance distribution to depend on the unit-level summary, we develop a second procedure that learns it directly.