🤖 AI Summary
Biomedical knowledge graph (KG) construction has long been hindered by the absence of standardized data formats, resulting in substantial redundant effort during multi-source integration. To address this, we propose a modular, configuration-driven KG ingestion framework built upon the KGX standard. The framework decouples the ingestion pipeline into reusable, atomic operations, enabling automated workflow execution and schema compliance enforcement via declarative YAML configurations. We develop Koza, an open-source toolkit, and Koza-Hub, a curated resource repository, which together support standardized, extensible transformation of data from 30 authoritative biomedical sources into KGX-compliant format. This approach significantly enhances cross-source interoperability and engineering reusability, providing a practical, sustainable technical paradigm for scalable biomedical KG construction.
📝 Abstract
Knowledge graph construction has become an essential domain for the future of biomedical research. But current approaches demand a high amount of redundant labor. These redundancies are the result of the lack of data standards and "knowledge-graph ready" data from sources. Using the KGX standard, we aim to solve these issues. Herein we introduce Koza and the Koza-Hub, a Python software package which streamlines ingesting raw biomedical information into the KGX format, and an associated set of conversion processes for thirty gold standard biomedical data sources. Our approach is to turn knowledge graph ingests into a set of primitive operations, provide configuration through YAML files, and enforce compliance with the chosen data schema.