A critical comparison of handling zeros in high-dimensional compositional count data

📅 2026-05-21
📈 Citations: 0
Influential: 0
📄 PDF

career value

188K/year
🤖 AI Summary
High-dimensional compositional count data—such as microbiome sequencing data—exhibit excessive zeros, overdispersion, and discrete lattice structures, violating the continuity assumptions of traditional log-ratio methods and leading to analytical bias. This study systematically evaluates three zero-handling strategies: zero-replacement transformations, rounded zero imputation, and essential-zero modeling, comparing their performance on both simulated and real datasets. For the first time, it integrates and quantifies the applicability boundaries of these approaches under the joint constraints of compositionality, zero inflation, and discreteness, revealing the critical influence of data discreteness on imputation accuracy. The findings provide practical guidance for method selection and highlight the need for future models that jointly account for compositional structure, zero inflation, and discrete data characteristics.
📝 Abstract
The growing use of high-throughput sequencing (HTS) has enabled the large-scale production of compositional count data, driving progress in microbiome research. However, such count data are often high-dimensional, over-dispersed, and heavily zero-inflated, and they conflict with the continuity assumptions underlying log-ratio-based compositional data analysis (CoDA), creating substantial methodological challenges. This review provides an overview of zero-handling strategies in compositional data, covering zero-tolerant transformations, imputation approaches for rounded zeros, and statistical models for essential zeros. We specifically highlight the problems that arise when applying the log-ratio framework to sequencing-derived compositional count data, where violations of continuity can induce numerical instabilities and biased statistical inferences. Motivated by these issues, we systematically examine how existing imputation strategies behave when adapted to discrete, zero-inflated count data, including an evaluation of how the discrete, lattice-valued nature of the data affects imputation performance. Overall, this review consolidates scattered methodological developments, clarifies appropriate use cases, and identifies open challenges that motivate future zero-handling frameworks capable of jointly accommodating compositional constraints, zero inflation, and the lattice nature of count data, while also providing a detailed discussion of the comparison results.
Problem

Research questions and friction points this paper is trying to address.

compositional data
zero-inflation
high-dimensional count data
log-ratio analysis
zero handling
Innovation

Methods, ideas, or system contributions that make the work stand out.

compositional data
zero-inflation
count data
imputation
log-ratio transformation