Koza and Koza-Hub for born-interoperable knowledge graph generation using KGX

📅 2025-09-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Biomedical knowledge graph (KG) construction has long been hindered by the absence of standardized data formats, resulting in substantial redundant effort during multi-source integration. To address this, we propose a modular, configuration-driven KG ingestion framework built upon the KGX standard. The framework decouples the ingestion pipeline into reusable, atomic operations, enabling automated workflow execution and schema compliance enforcement via declarative YAML configurations. We develop Koza, an open-source toolkit, and Koza-Hub, a curated resource repository, which together support standardized, extensible transformation of data from 30 authoritative biomedical sources into KGX-compliant format. This approach significantly enhances cross-source interoperability and engineering reusability, providing a practical, sustainable technical paradigm for scalable biomedical KG construction.

Technology Category

Application Category

📝 Abstract
Knowledge graph construction has become an essential domain for the future of biomedical research. But current approaches demand a high amount of redundant labor. These redundancies are the result of the lack of data standards and "knowledge-graph ready" data from sources. Using the KGX standard, we aim to solve these issues. Herein we introduce Koza and the Koza-Hub, a Python software package which streamlines ingesting raw biomedical information into the KGX format, and an associated set of conversion processes for thirty gold standard biomedical data sources. Our approach is to turn knowledge graph ingests into a set of primitive operations, provide configuration through YAML files, and enforce compliance with the chosen data schema.
Problem

Research questions and friction points this paper is trying to address.

Streamlining biomedical knowledge graph construction process
Reducing redundant labor in data standardization
Converting raw biomedical data into KGX format
Innovation

Methods, ideas, or system contributions that make the work stand out.

Python tool for KGX knowledge graph generation
YAML configuration for primitive operations
Standardizes thirty biomedical data sources
🔎 Similar Papers
No similar papers found.
D
Daniel R Korn
TISLab, Department of Genetics, University of North Carolina at Chapel Hill
P
Patrick Golden
TISLab, Department of Genetics, University of North Carolina at Chapel Hill
A
Aaron Odell
TISLab, Department of Genetics, University of North Carolina at Chapel Hill
K
Katherina Cortes
Department of Biomedical Informatics, University of Colorado Anschutz Medical Campus
S
Shilpa Sundar
Carolina Health Informatics Program, University of North Carolina at Chapel Hill
K
Kevin Schaper
TISLab, Department of Genetics, University of North Carolina at Chapel Hill
S
Sarah Gehrke
TISLab, Department of Genetics, University of North Carolina at Chapel Hill
C
Corey Cox
TISLab, Department of Genetics, University of North Carolina at Chapel Hill
H
Harry Caufield
Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory
Justin Reese
Justin Reese
Lawrence Berkeley National Lab
computational biologybioinformaticsscientific programming
E
Evan Morris
Renaissance Computing Institute, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States
Christopher J Mungall
Christopher J Mungall
Lawrence Berkeley National Laboratory
BioinformaticsOntologiesArtificial IntelligenceSystems BiologyMachine Reasoning
Melissa Haendel
Melissa Haendel
University of North Carolina