🤖 AI Summary
Down syndrome (DS) exhibits pronounced clinical heterogeneity, yet existing research data remain fragmented, severely impeding mechanistic understanding and translational applications. To address this, we developed the first DS-specific knowledge graph platform, integrating data from nine NIH INCLUDE Initiative studies comprising 7,148 participants, 456 phenotypic conditions, and over 37,000 biospecimens. The graph further incorporates Monarch Initiative resources—4,281 genes and 7,077 variants—yielding >1.6 million semantically annotated associations. We introduce a unified semantic modeling framework for heterogeneous DS data, enabling both SPARQL and natural-language querying. Leveraging graph embedding and path-based reasoning, the platform supports AI-ready hypothesis generation. This infrastructure significantly enhances cross-study association discovery, genotype–phenotype systems analysis, and predictive modeling, providing a scalable foundation for elucidating DS heterogeneity mechanisms and advancing precision interventions.
📝 Abstract
Trisomy 21 results in Down syndrome, a multifaceted genetic disorder with diverse clinical phenotypes, including heart defects, immune dysfunction, neurodevelopmental differences, and early-onset dementia risk. Heterogeneity and fragmented data across studies challenge comprehensive research and translational discovery. The NIH INCLUDE (INvestigation of Co-occurring conditions across the Lifespan to Understand Down syndromE) initiative has assembled harmonized participant-level datasets, yet realizing their potential requires integrative analytical frameworks. We developed a knowledge graph-driven platform transforming nine INCLUDE studies, comprising 7,148 participants, 456 conditions, 501 phenotypes, and over 37,000 biospecimens, into a unified semantic infrastructure. Cross-resource enrichment with Monarch Initiative data expands coverage to 4,281 genes and 7,077 variants. The resulting knowledge graph contains over 1.6 million semantic associations, enabling AI-ready analysis with graph embeddings and path-based reasoning for hypothesis generation. Researchers can query the graph via SPARQL or natural language interfaces. This framework converts static data repositories into dynamic discovery environments, supporting cross-study pattern recognition, predictive modeling, and systematic exploration of genotype-phenotype relationships in Down syndrome.