How does limma-trend work? An empirical partially Bayes perspective

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

This study addresses the challenge of controlling the false discovery rate (FDR) in high-throughput biological experiments, where sample sizes are small yet thousands of linear models must be fitted simultaneously—a setting in which conventional methods often fail. The authors demonstrate that limma-trend can be formally embedded within an empirical partially Bayes framework, revealing its underlying mechanism as an approximate partially Bayes p-value computation based on residual variances and unit-level statistics. This insight also clarifies the failure mode of MAnorm2. Building on this foundation, the paper proposes two nonparametric generalizations that asymptotically control FDR in signal-dense scenarios without requiring correct specification of the variance trend. These new methods substantially enhance the reliability and robustness of high-dimensional, small-sample biological data analysis.

📝 Abstract

In high-throughput biology, it is common to fit thousands of linear regressions -- one per gene, protein, or other unit -- with very few samples per unit. Limma-trend, one of the most widely used methods in this setting, improves power by shrinking variance estimates parametrically toward a fitted curve (the trend) relating variance to a unit-level summary (e.g., average intensity, peptide count), before computing p-values and applying the Benjamini-Hochberg procedure to control the false discovery rate (FDR). We study limma-trend through the lens of empirical partially Bayes inference, a paradigm in which a prior is posited and estimated for the nuisance parameters while parameters of interest remain fixed. From this perspective, limma-trend computes approximate partially Bayes p-values that condition on the residual sample variance and the unit-level summary. The same framework explains why MAnorm2, a popular variant for ChIP-seq, can sometimes fail to control FDR. We then derive a nonparametric generalization of limma-trend that estimates the residual variance prior using nonparametric maximum likelihood. Under dense signals, this procedure asymptotically controls the FDR -- even when the trend is misspecified or inconsistently estimated. To allow the full shape of the conditional variance distribution to depend on the unit-level summary, we develop a second procedure that learns it directly.

Problem

Research questions and friction points this paper is trying to address.

high-throughput biology

variance estimation

false discovery rate

empirical Bayes

linear regression

Innovation

Methods, ideas, or system contributions that make the work stand out.

empirical partially Bayes

limma-trend

nonparametric maximum likelihood