🤖 AI Summary
Clinical data retrieval has long confronted challenges including massive scale, loosely structured formats, and error-prone manual operations. To address these, we propose SearchAI—a generative AI–based retrieval method tailored for clinical data. Methodologically, SearchAI introduces a novel hierarchical generative retrieval framework that explicitly models parent–child relationships in clinical coding systems (e.g., ICD), enabling semantic expansion, synonym matching, and open-ended querying. It guarantees traceable, exhaustive semantic path traversal—departing from conventional one-to-one code mapping paradigms. The approach integrates hierarchical graph neural networks, clinical terminology embeddings, semantic similarity search, and multi-granularity code mapping modeling. Evaluated on both public and real-world production datasets, SearchAI achieves significant improvements in retrieval accuracy, robustness, response latency, and scalability—particularly for million-scale code vocabularies.
📝 Abstract
Artificial Intelligence (AI) is making a major impact on healthcare, particularly through its application in natural language processing (NLP) and predictive analytics. The healthcare sector has increasingly adopted AI for tasks such as clinical data analysis and medical code assignment. However, searching for clinical information in large and often unorganized datasets remains a manual and error-prone process. Assisting this process with automations can help physicians improve their operational productivity significantly. In this paper, we present a generative AI approach, coined SearchAI, to enhance the accuracy and efficiency of searching clinical data. Unlike traditional code assignment, which is a one-to-one problem, clinical data search is a one-to-many problem, i.e., a given search query can map to a family of codes. Healthcare professionals typically search for groups of related diseases, drugs, or conditions that map to many codes, and therefore, they need search tools that can handle keyword synonyms, semantic variants, and broad open-ended queries. SearchAI employs a hierarchical model that respects the coding hierarchy and improves the traversal of relationships from parent to child nodes. SearchAI navigates these hierarchies predictively and ensures that all paths are reachable without losing any relevant nodes. To evaluate the effectiveness of SearchAI, we conducted a series of experiments using both public and production datasets. Our results show that SearchAI outperforms default hierarchical traversals across several metrics, including accuracy, robustness, performance, and scalability. SearchAI can help make clinical data more accessible, leading to streamlined workflows, reduced administrative burden, and enhanced coding and diagnostic accuracy.