🤖 AI Summary
This study addresses the time-consuming and inconsistent nature of manual procedural logging by radiology residents. To alleviate this burden, the authors propose leveraging large language models (LLMs) to automatically extract structured procedural information from free-text radiology reports, thereby replacing manual documentation. The work presents the first systematic evaluation of both local and commercial LLMs for automating medical education documentation, enhanced through instruction prompting and chain-of-thought strategies to optimize information extraction. Experimental results demonstrate that the best-performing model achieves an F1 score of 0.87, offering high sensitivity and specificity while substantially reducing administrative workload and improving log consistency. Furthermore, the approach balances inference latency and token efficiency, making it practical for real-world clinical deployment.
📝 Abstract
Procedural case logs are a core requirement in radiology training, yet they are time-consuming to complete and prone to inconsistency when authored manually. This study investigates whether large language models (LLMs) can automate procedural case log documentation directly from free-text radiology reports. We evaluate multiple local and commercial LLMs under instruction-based and chain-of-thought prompting to extract structured procedural information from 414 curated interventional radiology reports authored by nine residents between 2018 and 2024. Model performance is assessed using sensitivity, specificity, and F1-score, alongside inference latency and token efficiency to estimate operational cost. Results show that both local and commercial models achieve strong extraction performance, with best F1-scores approaching 0.87, while exhibiting different trade-offs between speed and cost. Automation using LLMs has the potential to substantially reduce clerical burden for trainees and improve consistency in case logging. These findings demonstrate the feasibility of AI-assisted documentation in medical education and highlight the need for further validation across institutions and clinical workflows.