DS@GT at CheckThat! 2025: A Simple Retrieval-First, LLM-Backed Framework for Claim Normalization

📅 2025-08-24

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

This paper addresses multilingual social media claim normalization—a critical preprocessing task for fact-checking. We propose a retrieval-first, large language model (LLM)-assisted lightweight framework that dynamically integrates in-context learning from GPT-4o-mini with nearest-neighbor retrieval from the training set. Leveraging few-shot prompting and semantic similarity matching, it efficiently maps noisy, multilingual claims to standardized canonical forms, thereby improving downstream credibility classification. Our key contributions are threefold: (1) the first integration of retrieval-augmented generation with lightweight LLMs for cross-lingual claim normalization; (2) state-of-the-art performance—ranked first overall in the monolingual track across 13 languages, with substantial gains in normalization accuracy; and (3) empirical validation that data-aware prompting significantly enhances robustness for low-resource languages. However, zero-shot generalization remains limited, underscoring persistent dependencies on language coverage and training data distribution.

Technology Category

Application Category

📝 Abstract

Claim normalization is an integral part of any automatic fact-check verification system. It parses the typically noisy claim data, such as social media posts into normalized claims, which are then fed into downstream veracity classification tasks. The CheckThat! 2025 Task 2 focuses specifically on claim normalization and spans 20 languages under monolingual and zero-shot conditions. Our proposed solution consists of a lightweight emph{retrieval-first, LLM-backed} pipeline, in which we either dynamically prompt a GPT-4o-mini with in-context examples, or retrieve the closest normalization from the train dataset directly. On the official test set, the system ranks near the top for most monolingual tracks, achieving first place in 7 out of of the 13 languages. In contrast, the system underperforms in the zero-shot setting, highlighting the limitation of the proposed solution.

Problem

Research questions and friction points this paper is trying to address.

Normalizing noisy social media claims for fact-checking

Handling claim normalization across 20 languages

Addressing monolingual and zero-shot normalization challenges

Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-first pipeline with dynamic GPT-4o-mini prompting

Lightweight framework combining retrieval and LLM processing

Monolingual claim normalization across 13 languages

🔎 Similar Papers

Claim Verification in the Age of Large Language Models: A Survey