Sparse Bayesian Partially Identified Models for Sequence Count Data

📅 2025-12-12

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Compositional nature of sequencing count data in genomics renders absolute abundances unidentifiable; conventional normalization methods rely on a strong (and often unrealistic) constant-total-abundance assumption, where even minor violations induce >70% Type I/II error rates. While existing sparse methods incorporate sparsity assumptions, they treat sparsity as deterministic, ignoring its inherent uncertainty. Method: We propose the Sparse Bayesian Partial Identification Model (PIM), the first framework within Scale-Reliant Inference to explicitly model uncertainty in the sparsity assumption—integrating a Bayesian hierarchical structure, the horseshoe prior for adaptive shrinkage, and partial identification theory. Contribution/Results: PIM enables scale-invariant, theoretically consistent inference. Simulation and real-data analyses demonstrate that PIM reduces Type I/II error rates by over 50% and achieves substantially higher statistical power than state-of-the-art methods, particularly under variable total abundance—a setting where competing approaches fail catastrophically.

Technology Category

Application Category

📝 Abstract

In genomics, differential abundance and expression analyses are complicated by the compositional nature of sequence count data, which reflect only relative-not absolute-abundances or expression levels. Many existing methods attempt to address this limitation through data normalizations, but we have shown that such approaches imply strong, often biologically implausible assumptions about total microbial load or total gene expression. Even modest violations of these assumptions can inflate Type I and Type II error rates to over 70%. Sparse estimators have been proposed as an alternative, leveraging the assumption that only a small subset of taxa (or genes) change between conditions. However, we show that current sparse methods suffer from similar pathologies because they treat sparsity assumptions as fixed and ignore the uncertainty inherent in these assumptions. We introduce a sparse Bayesian Partially Identified Model (PIM) that addresses this limitation by explicitly modeling uncertainty in sparsity assumptions. Our method extends the Scale-Reliant Inference (SRI) framework to the sparse setting, providing a principled approach to differential analysis under scale uncertainty. We establish theoretical consistency of the proposed estimator and, through extensive simulations and real data analyses, demonstrate substantial reductions in both Type I and Type II errors compared to existing methods.

Problem

Research questions and friction points this paper is trying to address.

Addresses compositional bias in genomic sequence count data

Models uncertainty in sparsity assumptions for differential analysis

Reduces Type I and Type II errors under scale uncertainty

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse Bayesian model handles sparsity assumption uncertainty

Extends Scale-Reliant Inference to sparse settings

Reduces Type I and Type II errors significantly

🔎 Similar Papers

Dy-mer: An Explainable DNA Sequence Representation Scheme using Sparse Recovery