🤖 AI Summary
Automatic ICD coding of Chinese electronic medical records (EMRs) faces core challenges including difficulty in disease information extraction, insufficient utilization of multi-axis coding knowledge, and lack of clinical evidence integration. Method: This paper proposes the first end-to-end framework that jointly incorporates four-dimensional ICD coding axis knowledge and clinical evidence verification. It introduces an axis-consistent reasoning module based on masked language modeling to jointly optimize code generation and interpretable validation; additionally, it integrates clinical evidence retrieval with a credibility scoring mechanism to explicitly model evidence–knowledge alignment. Contribution/Results: Evaluated on a large-scale, multi-center Chinese EMR dataset, the method significantly outperforms state-of-the-art approaches. Real-world coding experiments demonstrate measurable improvements in both coding accuracy and efficiency for human coders, confirming its clinical deployability and practical utility.
📝 Abstract
The task of automatically coding the International Classification of Diseases (ICD) in the medical field has been well-established and has received much attention. Automatic coding of the ICD in the medical field has been successful in English but faces challenges when dealing with Chinese electronic medical records (EMRs). The first issue lies in the difficulty of extracting disease code-related information from Chinese EMRs, primarily due to the concise writing style and specific internal structure of the EMRs. The second problem is that previous methods have failed to leverage the disease-based multi-axial knowledge and lack of association with the corresponding clinical evidence. This paper introduces a novel framework called MKE-Coder: Multi-axial Knowledge with Evidence verification in ICD coding for Chinese EMRs. Initially, we identify candidate codes for the diagnosis and categorize each of them into knowledge under four coding axes.Subsequently, we retrieve corresponding clinical evidence from the comprehensive content of EMRs and filter credible evidence through a scoring model. Finally, to ensure the validity of the candidate code, we propose an inference module based on the masked language modeling strategy. This module verifies that all the axis knowledge associated with the candidate code is supported by evidence and provides recommendations accordingly. To evaluate the performance of our framework, we conduct experiments using a large-scale Chinese EMR dataset collected from various hospitals. The experimental results demonstrate that MKE-Coder exhibits significant superiority in the task of automatic ICD coding based on Chinese EMRs. In the practical evaluation of our method within simulated real coding scenarios, it has been demonstrated that our approach significantly aids coders in enhancing both their coding accuracy and speed.