UNH at CheckThat! 2025: Fine-tuning Vs Prompting in Claim Extraction

📅 2025-09-08

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This study addresses the automatic extraction of verifiable claims from social media text. We systematically compare three paradigms—few-shot prompting, in-context learning, and full-parameter fine-tuning—on large language models (LLMs) such as FLAN-T5. Experimental results show that fine-tuned FLAN-T5 achieves the highest METEOR score, yet certain prompting methods yield claims rated higher in human evaluation, exposing a substantial misalignment between automated metrics and factual verification requirements. To our knowledge, this is the first empirical demonstration in claim extraction that prompting strategies can offer advantages in semantic fidelity and verifiability over fine-tuning. The findings provide methodological insights for the pre-processing stage of fact-checking: effective claim extraction must jointly optimize both automated evaluation scores and human-judged quality—rather than maximizing metric performance in isolation.

Technology Category

Application Category

📝 Abstract

We participate in CheckThat! Task 2 English and explore various methods of prompting and in-context learning, including few-shot prompting and fine-tuning with different LLM families, with the goal of extracting check-worthy claims from social media passages. Our best METEOR score is achieved by fine-tuning a FLAN-T5 model. However, we observe that higher-quality claims can sometimes be extracted using other methods, even when their METEOR scores are lower.

Problem

Research questions and friction points this paper is trying to address.

Extracting check-worthy claims from social media passages

Comparing fine-tuning versus prompting methods for claim extraction

Evaluating LLM performance using METEOR scores and claim quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning FLAN-T5 model for claim extraction

Comparing prompting and in-context learning methods

Evaluating claim quality beyond METEOR scores

🔎 Similar Papers

No similar papers found.