BDI-Kit Demo: A Toolkit for Programmable and Conversational Data Harmonization

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of data integration arising from heterogeneity in data schemas, value representations, and domain-specific conventions. To this end, it proposes a data harmonization framework that synergistically combines programmatic interfaces with natural language interaction. The framework integrates schema- and value-matching algorithms, AI-augmented reasoning, and composable harmonization primitives, enabling users to flexibly construct reusable harmonization pipelines via a Python API. Simultaneously, it offers domain experts a conversational interface to explore, validate, and refine results using natural language. By coupling automated matching with iterative user feedback, the system significantly enhances both the efficiency and usability of data harmonization, as demonstrated in two representative application scenarios.
📝 Abstract
Data harmonization remains a major bottleneck for integrative analysis due to heterogeneity in schemas, value representations, and domain-specific conventions. BDI-Kit provides an extensible toolkit for schema and value matching. It exposes two complementary interfaces tailored to different user needs: a Python API enabling developers to construct harmonization pipelines programmatically, and an AI-assisted chat interface allowing domain experts to harmonize data through natural language dialogue. This demonstration showcases how users interact with BDI-Kit to iteratively explore, validate, and refine schema and value matches through a combination of automated matching, AI-assisted reasoning, and user-driven refinement. We present two scenarios: (i) using the Python API to programmatically compose primitives, examine intermediate outputs, and reuse transformations; and (ii) conversing with the AI assistant in natural language to access BDI-Kit's capabilities and iteratively refine outputs based on the assistant's suggestions.
Problem

Research questions and friction points this paper is trying to address.

data harmonization
schema matching
value representation
heterogeneous data
integrative analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

data harmonization
conversational AI
programmable toolkit
schema matching
natural language interface
🔎 Similar Papers
No similar papers found.