OraPO: Oracle-educated Reinforcement Learning for Data-efficient and Factual Radiology Report Generation

📅 2025-09-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Generating accurate chest X-ray reports under resource-constrained settings remains challenging due to high computational costs, reliance on large-scale annotated data, and poor factual consistency—especially for rare pathologies. Method: We propose an efficient, fact-oriented single-stage reinforcement learning framework built upon a lightweight vision-language model. It integrates GRPO (Generalized Reinforcement Learning with Policy Optimization), oracle-guided preference supervision, and FactScore-driven sentence-level factual rewards—derived from atomic clinical fact extraction and entailment verification—to convert sparse failure signals into dense, clinically grounded learning feedback. Contribution/Results: Our approach eliminates multi-stage training and large-data dependencies, achieving a new state-of-the-art F1 score of 0.341 on CheXpert Plus using only ~100–1,000 training samples—2–3 orders of magnitude fewer than prior methods. The model is deployable on commodity hardware and significantly improves clinical accuracy for rare findings while enhancing training efficiency.

Technology Category

Application Category

📝 Abstract
Radiology report generation (RRG) aims to automatically produce clinically faithful reports from chest X-ray images. Prevailing work typically follows a scale-driven paradigm, by multi-stage training over large paired corpora and oversized backbones, making pipelines highly data- and compute-intensive. In this paper, we propose Oracle-educated GRPO {OraPO) with a FactScore-based reward (FactS) to tackle the RRG task under constrained budgets. OraPO enables single-stage, RL-only training by converting failed GRPO explorations on rare or difficult studies into direct preference supervision via a lightweight oracle step. FactS grounds learning in diagnostic evidence by extracting atomic clinical facts and checking entailment against ground-truth labels, yielding dense, interpretable sentence-level rewards. Together, OraPO and FactS create a compact and powerful framework that significantly improves learning efficiency on clinically challenging cases, setting the new SOTA performance on the CheXpert Plus dataset (0.341 in F1) with 2--3 orders of magnitude less training data using a small base VLM on modest hardware.
Problem

Research questions and friction points this paper is trying to address.

Generate clinically accurate radiology reports from X-rays efficiently
Reduce data and computational requirements for radiology report generation
Improve learning efficiency on rare or difficult medical cases
Innovation

Methods, ideas, or system contributions that make the work stand out.

Single-stage RL training via oracle preference supervision
FactScore reward extracts clinical facts for grounding
Compact framework achieves SOTA with minimal data
🔎 Similar Papers
No similar papers found.