🤖 AI Summary
This study addresses the clinical challenges of complex Evaluation and Management (E/M) coding, high manual annotation burden, and low billing efficiency. We propose ProFees—a large language model (LLM)-based framework employing multi-step reasoning and structured prompting for CPT coding. Unlike single-step prompting or opaque commercial systems, ProFees explicitly models the multidimensional E/M rules (e.g., history, physical examination, medical decision-making), enabling interpretable and verifiable automated coding. Evaluated on an expert-annotated dataset of real-world clinical documentation, ProFees achieves 89.2% coding accuracy—outperforming leading commercial systems by 36.1% and the best single-prompt baseline by 4.8%. To our knowledge, this is the first work to systematically introduce a structured multi-step reasoning paradigm to E/M coding, significantly improving accuracy, robustness, and clinical trustworthiness.
📝 Abstract
Evaluation and Management (E/M) coding, under the Current Procedural Terminology (CPT) taxonomy, documents medical services provided to patients by physicians. Used primarily for billing purposes, it is in physicians' best interest to provide accurate CPT E/M codes. %While important, it is an auxiliary task that adds to physicians' documentation burden. Automating this coding task will help alleviate physicians' documentation burden, improve billing efficiency, and ultimately enable better patient care. However, a number of real-world complexities have made E/M encoding automation a challenging task. In this paper, we elaborate some of the key complexities and present ProFees, our LLM-based framework that tackles them, followed by a systematic evaluation. On an expert-curated real-world dataset, ProFees achieves an increase in coding accuracy of more than 36% over a commercial CPT E/M coding system and almost 5% over our strongest single-prompt baseline, demonstrating its effectiveness in addressing the real-world complexities.