🤖 AI Summary
Generating accurate chest X-ray reports under resource-constrained settings remains challenging due to high computational costs, reliance on large-scale annotated data, and poor factual consistency—especially for rare pathologies.
Method: We propose an efficient, fact-oriented single-stage reinforcement learning framework built upon a lightweight vision-language model. It integrates GRPO (Generalized Reinforcement Learning with Policy Optimization), oracle-guided preference supervision, and FactScore-driven sentence-level factual rewards—derived from atomic clinical fact extraction and entailment verification—to convert sparse failure signals into dense, clinically grounded learning feedback.
Contribution/Results: Our approach eliminates multi-stage training and large-data dependencies, achieving a new state-of-the-art F1 score of 0.341 on CheXpert Plus using only ~100–1,000 training samples—2–3 orders of magnitude fewer than prior methods. The model is deployable on commodity hardware and significantly improves clinical accuracy for rare findings while enhancing training efficiency.
📝 Abstract
Radiology report generation (RRG) aims to automatically produce clinically faithful reports from chest X-ray images. Prevailing work typically follows a scale-driven paradigm, by multi-stage training over large paired corpora and oversized backbones, making pipelines highly data- and compute-intensive. In this paper, we propose Oracle-educated GRPO {OraPO) with a FactScore-based reward (FactS) to tackle the RRG task under constrained budgets. OraPO enables single-stage, RL-only training by converting failed GRPO explorations on rare or difficult studies into direct preference supervision via a lightweight oracle step. FactS grounds learning in diagnostic evidence by extracting atomic clinical facts and checking entailment against ground-truth labels, yielding dense, interpretable sentence-level rewards. Together, OraPO and FactS create a compact and powerful framework that significantly improves learning efficiency on clinically challenging cases, setting the new SOTA performance on the CheXpert Plus dataset (0.341 in F1) with 2--3 orders of magnitude less training data using a small base VLM on modest hardware.