Koza and Koza-Hub for born-interoperable knowledge graph generation using KGX

📅 2025-09-10

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

Biomedical knowledge graph (KG) construction has long been hindered by the absence of standardized data formats, resulting in substantial redundant effort during multi-source integration. To address this, we propose a modular, configuration-driven KG ingestion framework built upon the KGX standard. The framework decouples the ingestion pipeline into reusable, atomic operations, enabling automated workflow execution and schema compliance enforcement via declarative YAML configurations. We develop Koza, an open-source toolkit, and Koza-Hub, a curated resource repository, which together support standardized, extensible transformation of data from 30 authoritative biomedical sources into KGX-compliant format. This approach significantly enhances cross-source interoperability and engineering reusability, providing a practical, sustainable technical paradigm for scalable biomedical KG construction.

Technology Category

Application Category

📝 Abstract

Knowledge graph construction has become an essential domain for the future of biomedical research. But current approaches demand a high amount of redundant labor. These redundancies are the result of the lack of data standards and "knowledge-graph ready" data from sources. Using the KGX standard, we aim to solve these issues. Herein we introduce Koza and the Koza-Hub, a Python software package which streamlines ingesting raw biomedical information into the KGX format, and an associated set of conversion processes for thirty gold standard biomedical data sources. Our approach is to turn knowledge graph ingests into a set of primitive operations, provide configuration through YAML files, and enforce compliance with the chosen data schema.

Problem

Research questions and friction points this paper is trying to address.

Streamlining biomedical knowledge graph construction process

Reducing redundant labor in data standardization

Converting raw biomedical data into KGX format

Innovation

Methods, ideas, or system contributions that make the work stand out.

Python tool for KGX knowledge graph generation

YAML configuration for primitive operations

Standardizes thirty biomedical data sources

🔎 Similar Papers

No similar papers found.