đ€ AI Summary
Existing Poisson-lognormal (PLN) models fail to capture hierarchical structuresâsuch as microbial taxonomies, administrative geographies, or product categoriesâcommonly present in count data, limiting their interpretability and generalizability in ecology, medicine, and related fields. To address this, we propose PLN-Tree: the first generative PLN model that explicitly incorporates a hierarchical tree prior to encode parentâchild dependencies among entities. Methodologically, we design a structured variational inference algorithm and establish theoretical guarantees for parameter identifiability. The model supports hierarchy-aware inference and downstream classification tasks. Experiments on synthetic and real-world microbiome datasets demonstrate that PLN-Tree significantly improves accuracy in modeling hierarchical dependencies and enhances ecological interpretability. Our results underscore the critical role of domain-specific prior knowledgeâe.g., taxonomic graphsâin modeling complex systems.
đ Abstract
When studying ecosystems, hierarchical trees are often used to organize entities based on proximity criteria, such as the taxonomy in microbiology, social classes in geography, or product types in retail businesses, offering valuable insights into entity relationships. Despite their significance, current count-data models do not leverage this structured information. In particular, the widely used Poisson log-normal (PLN) model, known for its ability to model interactions between entities from count data, lacks the possibility to incorporate such hierarchical tree structures, limiting its applicability in domains characterized by such complexities. To address this matter, we introduce the PLN-Tree model as an extension of the PLN model, specifically designed for modeling hierarchical count data. By integrating structured variational inference techniques, we propose an adapted training procedure and establish identifiability results, enhancing both theoretical foundations and practical interpretability. Additionally, we extend our framework to classification tasks as a preprocessing pipeline for compositional data, showcasing its versatility. Experimental evaluations on synthetic datasets as well as real-world microbiome data demonstrate the superior performance of the PLN-Tree model in capturing hierarchical dependencies and providing valuable insights into complex data structures, showing the practical interest of knowledge graphs like the taxonomy in ecosystems modeling.