Sparse Bayesian Partially Identified Models for Sequence Count Data

📅 2025-12-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Compositional nature of sequencing count data in genomics renders absolute abundances unidentifiable; conventional normalization methods rely on a strong (and often unrealistic) constant-total-abundance assumption, where even minor violations induce >70% Type I/II error rates. While existing sparse methods incorporate sparsity assumptions, they treat sparsity as deterministic, ignoring its inherent uncertainty. Method: We propose the Sparse Bayesian Partial Identification Model (PIM), the first framework within Scale-Reliant Inference to explicitly model uncertainty in the sparsity assumption—integrating a Bayesian hierarchical structure, the horseshoe prior for adaptive shrinkage, and partial identification theory. Contribution/Results: PIM enables scale-invariant, theoretically consistent inference. Simulation and real-data analyses demonstrate that PIM reduces Type I/II error rates by over 50% and achieves substantially higher statistical power than state-of-the-art methods, particularly under variable total abundance—a setting where competing approaches fail catastrophically.

Technology Category

Application Category

📝 Abstract
In genomics, differential abundance and expression analyses are complicated by the compositional nature of sequence count data, which reflect only relative-not absolute-abundances or expression levels. Many existing methods attempt to address this limitation through data normalizations, but we have shown that such approaches imply strong, often biologically implausible assumptions about total microbial load or total gene expression. Even modest violations of these assumptions can inflate Type I and Type II error rates to over 70%. Sparse estimators have been proposed as an alternative, leveraging the assumption that only a small subset of taxa (or genes) change between conditions. However, we show that current sparse methods suffer from similar pathologies because they treat sparsity assumptions as fixed and ignore the uncertainty inherent in these assumptions. We introduce a sparse Bayesian Partially Identified Model (PIM) that addresses this limitation by explicitly modeling uncertainty in sparsity assumptions. Our method extends the Scale-Reliant Inference (SRI) framework to the sparse setting, providing a principled approach to differential analysis under scale uncertainty. We establish theoretical consistency of the proposed estimator and, through extensive simulations and real data analyses, demonstrate substantial reductions in both Type I and Type II errors compared to existing methods.
Problem

Research questions and friction points this paper is trying to address.

Addresses compositional bias in genomic sequence count data
Models uncertainty in sparsity assumptions for differential analysis
Reduces Type I and Type II errors under scale uncertainty
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse Bayesian model handles sparsity assumption uncertainty
Extends Scale-Reliant Inference to sparse settings
Reduces Type I and Type II errors significantly
🔎 Similar Papers
No similar papers found.
W
Won Gu
Department of Statistics, Pennsylvania State University, University Park, PA, U.S.A.
Francesca Chiaromonte
Francesca Chiaromonte
Professor of Statistics, Pennsylvania State University, Sant'Anna School of Advanced Studies
StatisticsGenomicsBioinformaticsMeteorologyEconomics
J
Justin D. Silverman
Department of Statistics, Pennsylvania State University, University Park, PA, U.S.A.; College of Information Sciences and Technology, Pennsylvania State University, University Park, PA, U.S.A.; Department of Medicine, Pennsylvania State University, Hershey, PA, U.S.A.; Institute for Computational and Data Science, Pennsylvania State University, University Park, PA, U.S.A.