RAG-Coding: Enhancing LLM Medical Coding with Structured External Knowledge

📅 2026-04-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing large language model (LLM)-based medical coding approaches, which rely solely on internal knowledge and are prone to hallucinations while struggling to adapt to dynamically updated coding guidelines. To overcome these challenges, the authors propose RAG-Coding, a training-free framework that integrates structured external knowledge into the LLM coding pipeline for the first time. Specifically, they construct an ICD taxonomy knowledge graph and distill official coding guidelines into code-specific summaries, which are then incorporated via retrieval-augmented generation to enable intelligent coding. Evaluated on the MDACE dataset, RAG-Coding achieves a 3–13% improvement in micro F1 score over the strongest LLM baselines. Furthermore, on the newly released MDACE-2025 benchmark, it demonstrates an 11% gain in recall, significantly enhancing generalization and adaptability to the latest coding guidelines.
📝 Abstract
We present RAG-Coding, an agentic method for automated ICD-10-CM coding. RAG-Coding orchestrates four large language model (LLM) agents and grounds their coding decisions in external knowledge sources (e.g. the official coding tabular list and guidelines). By retrieving and cross-referencing relevant knowledge in these sources, the agents enhance coding accuracy and ensure clinical compliance. On the MDACE dataset, RAG-Coding outperforms the best LLM-based baseline by 8-13\% in micro-F1 and 2-8\% in macro-F1 across multiple LLM backbones. Compared to the state-of-the-art pretrained language model method, PLM-ICD, RAG-Coding exhibits higher micro recall (+11\%), while PLM-ICD exhibits higher micro precision (+6\%), yielding comparable micro- and macro-F1. Ablations show stepwise gains, highlighting the importance of incorporating external knowledge. We also release MDACE-2025, updating the original dataset with expert re-annotations with the latest 2025 ICD-10-CM guidelines. This update features more fine-grained code labels and enables evaluation against current clinical standards.
Problem

Research questions and friction points this paper is trying to address.

medical coding
LLM hallucination
guideline updates
external knowledge
ICD
Innovation

Methods, ideas, or system contributions that make the work stand out.

RAG-Coding
structured external knowledge
knowledge graph
medical coding
ICD guidelines
🔎 Similar Papers
No similar papers found.