BDIViz: An Interactive Visualization System for Biomedical Schema Matching with LLM-Powered Validation

📅 2025-07-21
📈 Citations: 0
Influential: 0
📄 PDF

career value

194K/year
🤖 AI Summary
Biomedical schema matching suffers from low automation accuracy due to high-dimensional, semantically ambiguous attributes and prohibitive manual validation costs. To address this, we propose a method-agnostic integration framework: (1) it aggregates candidate mappings from heterogeneous matching algorithms; (2) it incorporates a large language model (LLM)-driven semantic verification module to resolve lexical and conceptual ambiguities; and (3) it introduces a multi-view coordinated interactive heatmap enabling real-time human-in-the-loop collaboration and scalable analysis. Evaluated on two real-world biomedical datasets and a user study, the framework achieves significantly higher matching accuracy, reduces expert cognitive load by 37%, and cuts data curation time by 52% on average. Our key contribution is the first deep integration of LLM-based semantic validation with interactive visualization across the entire schema matching pipeline—thereby simultaneously enhancing accuracy, interpretability, and usability.

Technology Category

Application Category

📝 Abstract
Biomedical data harmonization is essential for enabling exploratory analyses and meta-studies, but the process of schema matching - identifying semantic correspondences between elements of disparate datasets (schemas) - remains a labor-intensive and error-prone task. Even state-of-the-art automated methods often yield low accuracy when applied to biomedical schemas due to the large number of attributes and nuanced semantic differences between them. We present BDIViz, a novel visual analytics system designed to streamline the schema matching process for biomedical data. Through formative studies with domain experts, we identified key requirements for an effective solution and developed interactive visualization techniques that address both scalability challenges and semantic ambiguity. BDIViz employs an ensemble approach that combines multiple matching methods with LLM-based validation, summarizes matches through interactive heatmaps, and provides coordinated views that enable users to quickly compare attributes and their values. Our method-agnostic design allows the system to integrate various schema matching algorithms and adapt to application-specific needs. Through two biomedical case studies and a within-subject user study with domain experts, we demonstrate that BDIViz significantly improves matching accuracy while reducing cognitive load and curation time compared to baseline approaches.
Problem

Research questions and friction points this paper is trying to address.

Automating biomedical schema matching to reduce manual effort
Improving accuracy in identifying semantic correspondences between datasets
Addressing scalability and semantic ambiguity in biomedical data harmonization
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based validation for schema matching
Interactive heatmaps for match summarization
Method-agnostic design integrating multiple algorithms
🔎 Similar Papers
No similar papers found.
💼 Related Jobs
AI Data Engineer--LLMs / Agentic Systems
Pfizer
The annual base salary for this position ranges from $106,000.00 to $176,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 15.0% of the base salary and eligibility to participate in our share based long term incentive program. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
United States - Massachusetts - Cambridge