🤖 AI Summary
This work addresses the limited ability of existing models to reliably discern whether evidence genuinely supports a claim, due to insufficient supervision over evidence dependency in fact-checking tasks. The authors propose a case-based evidence verification framework that automatically constructs supervision signals without manual annotation by integrating local context, external evidence, and structured claims to generate explicit supporting samples alongside semantically controlled non-supporting samples—including counterfactual errors and topic-relevant negatives. Instantiated in the radiology domain, the resulting model significantly outperforms baselines that rely solely on cases or evidence: it maintains robust performance when correct evidence is present but exhibits marked degradation upon evidence removal or substitution, thereby demonstrating authentic evidence dependency and strong out-of-distribution generalization.
📝 Abstract
Evidence-grounded reasoning requires more than attaching retrieved text to a prediction: a model should make decisions that depend on whether the provided evidence supports the target claim. In practice, this often fails because supervision is weak, evidence is only loosely tied to the claim, and evaluation does not test evidence dependence directly. We introduce case-grounded evidence verification, a general framework in which a model receives a local case context, external evidence, and a structured claim, and must decide whether the evidence supports the claim for that case. Our key contribution is a supervision construction procedure that generates explicit support examples together with semantically controlled non-support examples, including counterfactual wrong-state and topic-related negatives, without manual evidence annotation. We instantiate the framework in radiology and train a standard verifier on the resulting support task. The learned verifier substantially outperforms both case-only and evidence-only baselines, remains strong under correct evidence, and collapses when evidence is removed or swapped, indicating genuine evidence dependence. This behavior transfers across unseen evidence articles and an external case distribution, though performance degrades under evidence-source shift and remains sensitive to backbone choice. Overall, the results suggest that a major bottleneck in evidence grounding is not only model capacity, but the lack of supervision that encodes the causal role of evidence.