From Snippets to Semantics: Rethinking Evidence Granularity for Multilingual Fact Verification

📅 2026-05-26

📈 Citations: 0

✨ Influential: 0

career value

156K/year

🤖 AI Summary

Existing multilingual fact-checking approaches rely on fixed-granularity retrieval units—such as snippets, sentences, or local paragraphs—as evidence, often leading to contextual fragmentation and incomplete evidential support. This work proposes SEEK, a novel framework that introduces an adaptive chunking mechanism based on semantic topic boundaries to dynamically construct coherent semantic evidence blocks. SEEK integrates a multilingual encoder with a multilingual large language model fine-tuned via LoRA for veracity prediction. Evaluated on the X-FACT and RU22Fact datasets, SEEK achieves macro F1 improvements of up to 10%, 19%, and 20% over baselines using semantic chunking, sentence-level chunking, and search snippets, respectively, substantially enhancing evidence completeness and fact-checking reliability.

📝 Abstract

Multilingual fact verification requires evidence that is both relevant and sufficiently complete for reliable factuality prediction. However, existing systems often rely on search snippets, sentence-level evidence, or locally segmented passages, which can miss decisive context and produce fragmented evidence. To overcome these limitations, we propose SEEK, a Semantic Evidence Extraction with an adaptive chunKing framework that constructs coherent evidence chunks from full fact-checking articles by identifying semantic topic transitions and preserving local verification context. The constructed chunks are encoded using a multilingual encoder and then multilingual LLMs are finetuned using LoRA adapter for veracity prediction. Experiments on X-FACT and RU22Fact show that SEEK improves macro-f1 by up to 10% over semantic chunking, 19% over sentence chunking, and 20% over search-snippet baselines. Evidence completeness and significance analyses further show that SEEK preserves richer verification context and enables more reliable multilingual fact-checking.

Problem

Research questions and friction points this paper is trying to address.

multilingual fact verification

evidence granularity

context fragmentation

fact-checking

evidence completeness

Innovation

Methods, ideas, or system contributions that make the work stand out.

semantic chunking

multilingual fact verification

adaptive evidence extraction