UNH at CheckThat! 2025: Fine-tuning Vs Prompting in Claim Extraction

📅 2025-09-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the automatic extraction of verifiable claims from social media text. We systematically compare three paradigms—few-shot prompting, in-context learning, and full-parameter fine-tuning—on large language models (LLMs) such as FLAN-T5. Experimental results show that fine-tuned FLAN-T5 achieves the highest METEOR score, yet certain prompting methods yield claims rated higher in human evaluation, exposing a substantial misalignment between automated metrics and factual verification requirements. To our knowledge, this is the first empirical demonstration in claim extraction that prompting strategies can offer advantages in semantic fidelity and verifiability over fine-tuning. The findings provide methodological insights for the pre-processing stage of fact-checking: effective claim extraction must jointly optimize both automated evaluation scores and human-judged quality—rather than maximizing metric performance in isolation.

Technology Category

Application Category

📝 Abstract
We participate in CheckThat! Task 2 English and explore various methods of prompting and in-context learning, including few-shot prompting and fine-tuning with different LLM families, with the goal of extracting check-worthy claims from social media passages. Our best METEOR score is achieved by fine-tuning a FLAN-T5 model. However, we observe that higher-quality claims can sometimes be extracted using other methods, even when their METEOR scores are lower.
Problem

Research questions and friction points this paper is trying to address.

Extracting check-worthy claims from social media passages
Comparing fine-tuning versus prompting methods for claim extraction
Evaluating LLM performance using METEOR scores and claim quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning FLAN-T5 model for claim extraction
Comparing prompting and in-context learning methods
Evaluating claim quality beyond METEOR scores
🔎 Similar Papers
No similar papers found.
J
Joe Wilder
University of New Hampshire, Durham, NH, 03824, USA
N
Nikhil Kadapala
University of New Hampshire, Durham, NH, 03824, USA
B
Benji Xu
University of New Hampshire, Durham, NH, 03824, USA
M
Mohammed Alsaadi
University of New Hampshire, Durham, NH, 03824, USA
A
Aiden Parsons
University of New Hampshire, Durham, NH, 03824, USA
Mitchell Rogers
Mitchell Rogers
Research Fellow, CDSAI, Victoria University of Wellington
Hyperspectral imagingDeep learningGenetic programmingComputer visionAI for conservation
P
Palash Agarwal
University of New Hampshire, Durham, NH, 03824, USA
A
Adam Hassick
University of New Hampshire, Durham, NH, 03824, USA
Laura Dietz
Laura Dietz
University of New Hampshire
Information RetrievalKnowledge GraphsTopic ModelsNeural NetworksTime Series Forecasting