Large Language Models Are Better Logical Fallacy Reasoners with Counterargument, Explanation, and Goal-Aware Prompt Formulation

📅 2025-03-30

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Logical fallacy detection faces challenges including fine-grained classification difficulty and poor generalization. To address these, we propose a target-aware ternary prompting structure—comprising counterargument, explanation, and argumentative goal—that enables context-driven fallacy identification in both zero-shot and fine-tuning settings without introducing new parameters, thereby departing from conventional single-sentence classification paradigms. Our method integrates multi-query generation, confidence-based ranking, and context-enhanced prompt engineering, ensuring compatibility with both GPT- and LLaMA-family models. Evaluated across five domains and 29 fallacy types on multiple benchmark datasets, it achieves zero-shot F1 gains of +0.60 and supervised fine-tuning F1 improvements of +0.45 over state-of-the-art approaches. The core contribution is the first implicit, target-guided reasoning prompting framework for large language models—offering a lightweight, generalizable, and interpretable paradigm for logical reasoning.

Technology Category

Application Category

📝 Abstract

The advancement of Large Language Models (LLMs) has greatly improved our ability to process complex language. However, accurately detecting logical fallacies remains a significant challenge. This study presents a novel and effective prompt formulation approach for logical fallacy detection, applicable in both supervised (fine-tuned) and unsupervised (zero-shot) settings. Our method enriches input text incorporating implicit contextual information -- counterarguments, explanations, and goals -- which we query for validity within the context of the argument. We then rank these queries based on confidence scores to inform classification. We evaluate our approach across multiple datasets from 5 domains, covering 29 distinct fallacy types, using models from the GPT and LLaMA series. The results show substantial improvements over state-of-the-art models, with F1 score increases of up to 0.60 in zero-shot settings and up to 0.45 in fine-tuned models. Extensive analyses further illustrate why and how our method excels.

Problem

Research questions and friction points this paper is trying to address.

Improving logical fallacy detection in Large Language Models

Enhancing accuracy with counterargument, explanation, and goal-aware prompts

Evaluating performance across diverse datasets and fallacy types

Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhances LLMs with counterargument, explanation, goal-aware prompts

Ranks queries by confidence scores for classification

Improves F1 scores in zero-shot and fine-tuned settings

🔎 Similar Papers

No similar papers found.

Authors to Follow