The Alignment Bottleneck in Decomposition-Based Claim Verification

📅 2026-02-11

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This study addresses the instability of existing claim decomposition approaches in verifying complex, multifaceted claims, which stems primarily from insufficient evidence alignment and inadequate modeling of error patterns in sub-claims. The authors introduce a new dataset featuring temporally constrained evidence and human-annotated evidence spans for sub-claims, enabling a systematic evaluation of claim decomposition under two evidence alignment settings. They reveal, for the first time, the critical impact of fine-grained evidence alignment and label bias in sub-models on verification performance, and propose a structured decomposition framework that contrasts sub-claim aligned evidence (SAE) with repeated claim-level evidence (SRE). Experiments demonstrate that decomposition significantly improves performance only under strictly aligned, fine-grained evidence conditions, and that incorporating an “abstain” strategy effectively mitigates error propagation, with consistent results across multiple datasets.

Technology Category

Application Category

📝 Abstract

Structured claim decomposition is often proposed as a solution for verifying complex, multi-faceted claims, yet empirical results have been inconsistent. We argue that these inconsistencies stem from two overlooked bottlenecks: evidence alignment and sub-claim error profiles. To better understand these factors, we introduce a new dataset of real-world complex claims, featuring temporally bounded evidence and human-annotated sub-claim evidence spans. We evaluate decomposition under two evidence alignment setups: Sub-claim Aligned Evidence (SAE) and Repeated Claim-level Evidence (SRE). Our results reveal that decomposition brings significant performance improvement only when evidence is granular and strictly aligned. By contrast, standard setups that rely on repeated claim-level evidence (SRE) fail to improve and often degrade performance as shown across different datasets and domains (PHEMEPlus, MMM-Fact, COVID-Fact). Furthermore, we demonstrate that in the presence of noisy sub-claim labels, the nature of the error ends up determining downstream robustness. We find that conservative"abstention"significantly reduces error propagation compared to aggressive but incorrect predictions. These findings suggest that future claim decomposition frameworks must prioritize precise evidence synthesis and calibrate the label bias of sub-claim verification models.

Problem

Research questions and friction points this paper is trying to address.

claim verification

structured decomposition

evidence alignment

sub-claim errors

error propagation

Innovation

Methods, ideas, or system contributions that make the work stand out.

evidence alignment

claim decomposition

sub-claim verification