🤖 AI Summary
Compositional data such as microbiome profiles often contain excessive zeros, rendering conventional modeling approaches ineffective. This work proposes a unified probabilistic modeling framework that maps the data to the positive orthant of the unit hypersphere via the isometric log-ratio transformation with square-root scaling. By integrating a latent-variable formulation of the Fisher–Bingham distribution with a deterministic transformation, the method directly generates exact zeros without requiring imputation or separate zero-inflated components. It thus enables full likelihood-based inference for zero-containing compositional data—a capability not previously achieved—and facilitates structured differential abundance testing grounded in a parametric model. Simulations demonstrate substantially improved statistical power under high zero proportions, and application to a dietary intervention study successfully uncovers microbial community shifts missed by standard methods.
📝 Abstract
This paper introduces a rectified and renormalized Fisher-Bingham model for compositional data with zeros, motivated in part by the presence of zeros in microbiota studies. The approach represents compositions through a square-root transformation that maps data to the positive orthant of the unit sphere, and models them via a latent Fisher-Bingham followed by a deterministic transformation that induces exact zeros. This construction yields a coherent likelihood without requiring zero imputation or separate modeling of zero and nonzero components. Parameter estimation is performed using a Monte Carlo expectation-maximization algorithm that accommodates the latent structure. We further develop a score test for detecting structured differences in composition across groups, providing a parametric alternative to commonly used distance-based methods. Simulation studies demonstrate that the proposed method closely approximates the induced distribution and achieves higher power for detecting structured compositional changes, particularly when observations include many zero-valued components. An application to a dietary intervention study illustrates that the method identifies meaningful microbiota shifts not detected by standard approaches.