An Agentic Model Context Protocol Framework for Medical Concept Standardization

๐Ÿ“… 2025-09-03
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
OMOP Common Data Model (CDM) terminology mapping faces challenges including high resource consumption, error-proneness, and significant hallucination risks in large language models (LLMs), hindering clinical deployment. Method: We propose a zero-training, hallucination-mitigated proxy-based concept mapping framework grounded in the Model Context Protocol (MCP). It integrates structured prompt engineering with real-time retrieval from authoritative medical knowledge bases, enabling secure, interpretable semantic reasoning and dynamic vocabulary lookupโ€”without fine-tuning or domain-specific training. Contribution/Results: The framework is plug-and-play across multi-institutional clinical research settings. Experiments demonstrate substantial improvements in mapping accuracy and consistency, reduced manual curation effort, and robust support for both exploratory research and production-grade deployment.

Technology Category

Application Category

๐Ÿ“ Abstract
The Observational Medical Outcomes Partnership (OMOP) common data model (CDM) provides a standardized representation of heterogeneous health data to support large-scale, multi-institutional research. One critical step in data standardization using OMOP CDM is the mapping of source medical terms to OMOP standard concepts, a procedure that is resource-intensive and error-prone. While large language models (LLMs) have the potential to facilitate this process, their tendency toward hallucination makes them unsuitable for clinical deployment without training and expert validation. Here, we developed a zero-training, hallucination-preventive mapping system based on the Model Context Protocol (MCP), a standardized and secure framework allowing LLMs to interact with external resources and tools. The system enables explainable mapping and significantly improves efficiency and accuracy with minimal effort. It provides real-time vocabulary lookups and structured reasoning outputs suitable for immediate use in both exploratory and production environments.
Problem

Research questions and friction points this paper is trying to address.

Mapping source medical terms to OMOP standard concepts
Resource-intensive and error-prone medical data standardization
Preventing LLM hallucinations in clinical concept mapping
Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-training mapping system using Model Context Protocol
Real-time vocabulary lookups with structured reasoning outputs
Hallucination-preventive framework for medical concept standardization
๐Ÿ”Ž Similar Papers
No similar papers found.
J
Jaerong Ahn
McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA
Andrew Wen
Andrew Wen
Data Scientist II, University of Texas Health Sciences Center at Houston | PhD Student @ Rice
Big DataDigital MedicineNatural Language ProcessingClinical NLPInformation Retrieval
N
Nan Wang
McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA
H
Heling Jia
Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, USA
Z
Zhiyi Yue
McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA
S
Sunyang Fu
McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA
H
Hongfang Liu
McWilliams School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, USA