Weakly Supervised Medical Entity Extraction and Linking for Chief Complaints

📅 2025-09-01

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

Medical chief complaint texts exhibit high lexical variability and suffer from a lack of annotated data, hindering terminology standardization. To address this, we propose a weakly supervised, end-to-end framework for entity extraction and ontology linking. Our approach introduces a novel “split-and-match” algorithm that automatically generates high-quality weak supervision signals—eliminating the need for manual annotation—and jointly models mention detection and standardized concept linking within a BERT-based architecture. Evaluated on 1.2 million real-world chief complaint records, our method significantly outperforms existing unsupervised and weakly supervised baselines in both precision and cross-institutional generalizability. It achieves robust performance without domain-specific lexicons or handcrafted rules, offering a scalable, low-dependency solution for clinical natural language processing tasks requiring consistent medical terminology normalization.

Technology Category

Application Category

📝 Abstract

A Chief complaint (CC) is the reason for the medical visit as stated in the patient's own words. It helps medical professionals to quickly understand a patient's situation, and also serves as a short summary for medical text mining. However, chief complaint records often take a variety of entering methods, resulting in a wide variation of medical notations, which makes it difficult to standardize across different medical institutions for record keeping or text mining. In this study, we propose a weakly supervised method to automatically extract and link entities in chief complaints in the absence of human annotation. We first adopt a split-and-match algorithm to produce weak annotations, including entity mention spans and class labels, on 1.2 million real-world de-identified and IRB approved chief complaint records. Then we train a BERT-based model with generated weak labels to locate entity mentions in chief complaint text and link them to a pre-defined ontology. We conducted extensive experiments, and the results showed that our Weakly Supervised Entity Extraction and Linking (ours) method produced superior performance over previous methods without any human annotation.

Problem

Research questions and friction points this paper is trying to address.

Extracting medical entities from chief complaints

Linking entities to ontology without human annotation

Standardizing varied medical notations across institutions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Weakly supervised split-and-match algorithm

BERT-based model for entity extraction

Automatic linking to predefined ontology

🔎 Similar Papers

No similar papers found.