Knowing the Facts but Choosing the Shortcut: Understanding How Large Language Models Compare Entities

📅 2025-10-19

📈 Citations: 0

✨ Influential: 0

career value

152K/year

🤖 AI Summary

This study investigates when large language models (LLMs) rely on factual knowledge versus defaulting to superficial heuristic reasoning in entity comparison tasks. Method: Through systematic experiments and logistic regression analysis, we identify three salient heuristic biases—popularity, mention order, and semantic co-occurrence—and show that shallow models leveraging only these cues outperform raw predictions from small-scale LLMs. We further introduce numerically grounded chain-of-thought prompting to steer models of all sizes toward reliable numerical reasoning. Contribution/Results: We provide the first empirical evidence that LLMs possess selective knowledge activation capability—present in large but absent in small models—and propose an interpretable, intervenable framework for diagnosing and correcting cognitive biases. Our approach enables transparent bias attribution and targeted mitigation, advancing both understanding and controllability of LLM reasoning.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are increasingly used for knowledge-based reasoning tasks, yet understanding when they rely on genuine knowledge versus superficial heuristics remains challenging. We investigate this question through entity comparison tasks by asking models to compare entities along numerical attributes (e.g., ``Which river is longer, the Danube or the Nile?''), which offer clear ground truth for systematic analysis. Despite having sufficient numerical knowledge to answer correctly, LLMs frequently make predictions that contradict this knowledge. We identify three heuristic biases that strongly influence model predictions: entity popularity, mention order, and semantic co-occurrence. For smaller models, a simple logistic regression using only these surface cues predicts model choices more accurately than the model's own numerical predictions, suggesting heuristics largely override principled reasoning. Crucially, we find that larger models (32B parameters) selectively rely on numerical knowledge when it is more reliable, while smaller models (7--8B parameters) show no such discrimination, which explains why larger models outperform smaller ones even when the smaller models possess more accurate knowledge. Chain-of-thought prompting steers all models towards using the numerical features across all model sizes.

Problem

Research questions and friction points this paper is trying to address.

Investigating when LLMs use genuine knowledge versus heuristics

Identifying three heuristic biases affecting entity comparison tasks

Analyzing how model size impacts reliance on numerical knowledge

Innovation

Methods, ideas, or system contributions that make the work stand out.

Identified three heuristic biases influencing predictions

Larger models selectively use numerical knowledge over heuristics

Chain-of-thought prompting enhances numerical feature utilization

🔎 Similar Papers

No similar papers found.