🤖 AI Summary
Medical coding requires mapping unstructured clinical text to over 70,000 ICD-10 diagnosis and procedure codes, with low accuracy and poor coverage for rare diseases representing a critical bottleneck. This paper proposes the first multi-agent generative coding framework designed for the full ICD-10 taxonomy: specialized agents emulate core expert decision steps—including terminology standardization, rule-based reasoning, and guideline verification—collaboratively adhering to official coding conventions. We introduce discriminative fine-tuning to enhance recognition of frequent codes and systematically identify model blind spots for rare codes. Experiments demonstrate state-of-the-art performance on rare diagnosis code prediction, the first end-to-end support for the complete ICD-10 code set, and the first empirical quantification of inherent systematic under-coding bias in generative models—revealing persistent gaps in rare-code recall despite high overall accuracy.
📝 Abstract
In medical coding, experts map unstructured clinical notes to alphanumeric codes for diagnoses and procedures. We introduce Code Like Humans: a new agentic framework for medical coding with large language models. It implements official coding guidelines for human experts, and it is the first solution that can support the full ICD-10 coding system (+70K labels). It achieves the best performance to date on rare diagnosis codes (fine-tuned discriminative classifiers retain an advantage for high-frequency codes, to which they are limited). Towards future work, we also contribute an analysis of system performance and identify its `blind spots' (codes that are systematically undercoded).