MKE-Coder: Multi-Axial Knowledge with Evidence Verification in ICD Coding for Chinese EMRs

📅 2025-02-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Automatic ICD coding of Chinese electronic medical records (EMRs) faces core challenges including difficulty in disease information extraction, insufficient utilization of multi-axis coding knowledge, and lack of clinical evidence integration. Method: This paper proposes the first end-to-end framework that jointly incorporates four-dimensional ICD coding axis knowledge and clinical evidence verification. It introduces an axis-consistent reasoning module based on masked language modeling to jointly optimize code generation and interpretable validation; additionally, it integrates clinical evidence retrieval with a credibility scoring mechanism to explicitly model evidence–knowledge alignment. Contribution/Results: Evaluated on a large-scale, multi-center Chinese EMR dataset, the method significantly outperforms state-of-the-art approaches. Real-world coding experiments demonstrate measurable improvements in both coding accuracy and efficiency for human coders, confirming its clinical deployability and practical utility.

Technology Category

Application Category

📝 Abstract
The task of automatically coding the International Classification of Diseases (ICD) in the medical field has been well-established and has received much attention. Automatic coding of the ICD in the medical field has been successful in English but faces challenges when dealing with Chinese electronic medical records (EMRs). The first issue lies in the difficulty of extracting disease code-related information from Chinese EMRs, primarily due to the concise writing style and specific internal structure of the EMRs. The second problem is that previous methods have failed to leverage the disease-based multi-axial knowledge and lack of association with the corresponding clinical evidence. This paper introduces a novel framework called MKE-Coder: Multi-axial Knowledge with Evidence verification in ICD coding for Chinese EMRs. Initially, we identify candidate codes for the diagnosis and categorize each of them into knowledge under four coding axes.Subsequently, we retrieve corresponding clinical evidence from the comprehensive content of EMRs and filter credible evidence through a scoring model. Finally, to ensure the validity of the candidate code, we propose an inference module based on the masked language modeling strategy. This module verifies that all the axis knowledge associated with the candidate code is supported by evidence and provides recommendations accordingly. To evaluate the performance of our framework, we conduct experiments using a large-scale Chinese EMR dataset collected from various hospitals. The experimental results demonstrate that MKE-Coder exhibits significant superiority in the task of automatic ICD coding based on Chinese EMRs. In the practical evaluation of our method within simulated real coding scenarios, it has been demonstrated that our approach significantly aids coders in enhancing both their coding accuracy and speed.
Problem

Research questions and friction points this paper is trying to address.

Challenges in ICD coding for Chinese EMRs.
Difficulty extracting disease code information.
Leveraging multi-axial knowledge with evidence.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-axial knowledge extraction for ICD coding
Evidence verification using scoring model
Masked language modeling for code validation
🔎 Similar Papers
No similar papers found.
X
Xinxin You
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
Xien Liu
Xien Liu
Tsinghua University
Deep LearningMedicalNLPLarge Language Models
X
Xue Yang
Tsinghua-iFlytek Joint Laboratory, Iflytek, Beijing 100084, China
Z
Ziyi Wang
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
Ji Wu
Ji Wu
Tsinghua University
Artificial Intelligence,smart healthcaremachine learningpattern recognitionspeech recognition