HoT: Highlighted Chain of Thought for Referencing Supporting Facts from Inputs

📅 2025-03-03

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

Large language models (LLMs) frequently generate responses conflating factual content with hallucinations, severely impeding human verification and decision-making. To address this, we propose Fact-Anchored Highlight Chain-of-Thought (FA-Highlight-CoT), the first method to explicitly embed XML-style markup within CoT reasoning to highlight input facts directly supporting each inference step—thereby enforcing fact-aligned generation. FA-Highlight-CoT integrates few-shot prompting with a structured fact-anchoring mechanism. It achieves significant improvements over standard CoT across 17 diverse, cross-domain tasks. Human evaluation demonstrates that highlighting substantially enhances both accuracy and efficiency in identifying correct answers; however, it also uncovers a novel cognitive bias—“highlight credibility bias”—wherein incorrectly highlighted information paradoxically increases user confidence in falsehoods. This work establishes a new paradigm for verifiable, trustworthy LLM reasoning and provides empirically grounded insights into human-AI interaction dynamics.

Technology Category

Application Category

📝 Abstract

An Achilles heel of Large Language Models (LLMs) is their tendency to hallucinate non-factual statements. A response mixed of factual and non-factual statements poses a challenge for humans to verify and accurately base their decisions on. To combat this problem, we propose Highlighted Chain-of-Thought Prompting (HoT), a technique for prompting LLMs to generate responses with XML tags that ground facts to those provided in the query. That is, given an input question, LLMs would first re-format the question to add XML tags highlighting key facts, and then, generate a response with highlights over the facts referenced from the input. Interestingly, in few-shot settings, HoT outperforms vanilla chain of thought prompting (CoT) on a wide range of 17 tasks from arithmetic, reading comprehension to logical reasoning. When asking humans to verify LLM responses, highlights help time-limited participants to more accurately and efficiently recognize when LLMs are correct. Yet, surprisingly, when LLMs are wrong, HoTs tend to make users believe that an answer is correct.

Problem

Research questions and friction points this paper is trying to address.

Addresses LLMs' tendency to hallucinate non-factual statements.

Proposes HoT to ground facts using XML tags in responses.

Improves accuracy and efficiency in human verification of LLM outputs.

Innovation

Methods, ideas, or system contributions that make the work stand out.

XML tags highlight key facts in LLM responses.

HoT improves accuracy in few-shot learning tasks.

Highlights aid humans in verifying LLM-generated facts.

🔎 Similar Papers

No similar papers found.