Reasoning Beyond Bias: A Study on Counterfactual Prompting and Chain of Thought Reasoning

📅 2024-08-16
🏛️ arXiv.org
📈 Citations: 4
Influential: 0
📄 PDF
🤖 AI Summary
Language models rely on statistical shortcuts rather than semantic reasoning due to training data biases, leading to systematic answer preferences in MMLU—preferences that persist even under chain-of-thought (CoT) prompting and mirror human test-taking strategies. Method: We propose APriCoT (Adversarially Primed Counterfactual Prompting with CoT), the first framework integrating counterfactual prompting with CoT to explicitly decouple base-rate bias from semantic logic. It combines counterfactual prompt engineering, base-rate modeling, and comparative analysis against human strategies. Contribution/Results: We empirically demonstrate that CoT is susceptible to confirmation bias, arguing for deliberate “System 2”-style reasoning to improve robustness. APriCoT significantly mitigates base-rate bias and improves accuracy on MMLU, establishing that bias mitigation and reasoning capability can be jointly enhanced—thereby advancing both fairness and reliability in large language model evaluation.

Technology Category

Application Category

📝 Abstract
Language models are known to absorb biases from their training data, leading to predictions driven by statistical regularities rather than semantic relevance. We investigate the impact of these biases on answer choice preferences in the Massive Multi-Task Language Understanding (MMLU) task. Our findings reveal that differences in learned regularities across answer options are predictive of model preferences and mirror human test-taking strategies. To address this issue, we introduce two novel methods: Counterfactual Prompting with Chain of Thought (CoT) and Counterfactual Prompting with Agnostically Primed CoT (APriCoT). We demonstrate that while Counterfactual Prompting with CoT alone is insufficient to mitigate bias, our novel Primed Counterfactual Prompting with CoT approach effectively reduces the influence of base-rate probabilities while improving overall accuracy. Our results suggest that mitigating bias requires a"System-2"like process and that CoT reasoning is susceptible to confirmation bias under some prompting methodologies. Our contributions offer practical solutions for developing more robust and fair language models.
Problem

Research questions and friction points this paper is trying to address.

Language models absorb biases from training data
Biases affect answer choices in MMLU task
APriCoT reduces bias and improves accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

APriCoT reduces bias via counterfactual prompting
APriCoT improves accuracy with agnostic priming
APriCoT enables slow thinking in language models