🤖 AI Summary
This study addresses the automatic extraction of verifiable claims from social media text. We systematically compare three paradigms—few-shot prompting, in-context learning, and full-parameter fine-tuning—on large language models (LLMs) such as FLAN-T5. Experimental results show that fine-tuned FLAN-T5 achieves the highest METEOR score, yet certain prompting methods yield claims rated higher in human evaluation, exposing a substantial misalignment between automated metrics and factual verification requirements. To our knowledge, this is the first empirical demonstration in claim extraction that prompting strategies can offer advantages in semantic fidelity and verifiability over fine-tuning. The findings provide methodological insights for the pre-processing stage of fact-checking: effective claim extraction must jointly optimize both automated evaluation scores and human-judged quality—rather than maximizing metric performance in isolation.
📝 Abstract
We participate in CheckThat! Task 2 English and explore various methods of prompting and in-context learning, including few-shot prompting and fine-tuning with different LLM families, with the goal of extracting check-worthy claims from social media passages. Our best METEOR score is achieved by fine-tuning a FLAN-T5 model. However, we observe that higher-quality claims can sometimes be extracted using other methods, even when their METEOR scores are lower.