๐ค AI Summary
OMOP Common Data Model (CDM) terminology mapping faces challenges including high resource consumption, error-proneness, and significant hallucination risks in large language models (LLMs), hindering clinical deployment. Method: We propose a zero-training, hallucination-mitigated proxy-based concept mapping framework grounded in the Model Context Protocol (MCP). It integrates structured prompt engineering with real-time retrieval from authoritative medical knowledge bases, enabling secure, interpretable semantic reasoning and dynamic vocabulary lookupโwithout fine-tuning or domain-specific training. Contribution/Results: The framework is plug-and-play across multi-institutional clinical research settings. Experiments demonstrate substantial improvements in mapping accuracy and consistency, reduced manual curation effort, and robust support for both exploratory research and production-grade deployment.
๐ Abstract
The Observational Medical Outcomes Partnership (OMOP) common data model (CDM) provides a standardized representation of heterogeneous health data to support large-scale, multi-institutional research. One critical step in data standardization using OMOP CDM is the mapping of source medical terms to OMOP standard concepts, a procedure that is resource-intensive and error-prone. While large language models (LLMs) have the potential to facilitate this process, their tendency toward hallucination makes them unsuitable for clinical deployment without training and expert validation. Here, we developed a zero-training, hallucination-preventive mapping system based on the Model Context Protocol (MCP), a standardized and secure framework allowing LLMs to interact with external resources and tools. The system enables explainable mapping and significantly improves efficiency and accuracy with minimal effort. It provides real-time vocabulary lookups and structured reasoning outputs suitable for immediate use in both exploratory and production environments.