DecomposeRL: Learning to Ask Useful, Informative, and Diverse Questions for Semi-Supervised, Traceable Claim Verification

📅 2026-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing fact-checking approaches struggle to balance accuracy and traceability: end-to-end models achieve high performance but lack interpretability, while decomposition-based methods offer transparency at the cost of reduced accuracy. This work proposes DecomposeRL, a framework that formulates claim decomposition as a reinforcement learning policy, integrating a multi-dimensional reward mechanism with the GRPO algorithm to support both fully supervised and semi-supervised learning. It further introduces an efficient training paradigm based on data distillation. With only 5K annotated samples, DecomposeRL-7B attains balanced accuracies of 86.3 (in-domain) and 69.8 (out-of-domain) across 11 benchmark domains, using a model size merely one-fourth that of the baseline. Its performance rivals that of 32B-scale models and GPT-4.1-mini, and it outperforms existing methods even under a semi-supervised setting with just 10% labeled data.
📝 Abstract
Claim verification splits between end-to-end classifiers that are accurate but yields no inspectable traces, and decomposition-based methods produce inspectable traces but lag performance on benchmark datasets. We propose DecomposeRL an accurate claim-verifier that produce inspectable traces. DecomposeRL frames decomposition as an RL policy trained with GRPO and a multi-faceted reward ensemble, enabling both fully supervised and semi-supervised learning from unlabeled claims. DecomposeRL addresses the prohibitive training cost of GRPO with a data-curation funnel that distills 115K fact-verification claims into a compact, learning-signal-dense subset of 5K claims. We show that a DecomposeRL-7B policy trained with full supervision on only ~5K curated claims achieves 86.3 in-domain and 69.8 out-of-domain balanced accuracy across 11 claim-verification benchmarks containing biomedical, political, scientific, and general-domain claims. Despite being 4x smaller, it matches 32B baselines and GPT-4.1-mini, and it further outperforms baselines in a semi-supervised setting with only 10% labeled claims data. Code, data, and models are available at https://dipta007.github.io/DecomposeRL
Problem

Research questions and friction points this paper is trying to address.

claim verification
inspectable traces
decomposition
semi-supervised learning
fact-checking
Innovation

Methods, ideas, or system contributions that make the work stand out.

DecomposeRL
reinforcement learning
claim verification
semi-supervised learning
interpretable reasoning