🤖 AI Summary
This work addresses the trade-off between classification accuracy and reasoning capability in radiology report disease classification, where supervised fine-tuning often improves accuracy at the expense of model interpretability. To reconcile this, the authors propose a two-stage approach: first, a lightweight large language model is fine-tuned with disease labels under supervision; subsequently, Group Relative Policy Optimization (GRPO) is applied to refine model outputs without requiring explicit reasoning annotations. This study presents the first application of GRPO to radiology text classification. Evaluated on three datasets annotated by radiologists, the method not only significantly outperforms baseline models in classification performance but also concurrently enhances reasoning recall and content comprehensiveness, achieving a synergistic improvement in both accuracy and reasoning quality.
📝 Abstract
Accurate disease classification from radiology reports is essential for many applications. While supervised fine-tuning (SFT) of lightweight LLMs improves accuracy, it can degrade reasoning. We propose a two-stage approach: SFT on disease labels followed by Group Relative Policy Optimization (GRPO) to refine predictions by optimizing accuracy and format without reasoning supervision. Across three radiologist-annotated datasets, SFT outperformed baselines and GRPO further improved classification and enhanced reasoning recall and comprehensiveness.