Questioning Internal Knowledge Structure of Large Language Models Through the Lens of the Olympic Games

📅 2024-09-10

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This study identifies a structural deficiency in large language models (LLMs) regarding historical Olympic medal knowledge: while LLMs accurately retrieve national medal counts (Task 1, >90% accuracy), they exhibit severe limitations in ranking logic reasoning (Task 2, <35% accuracy). We construct a systematic, fine-grained Olympic medal dataset and evaluate multiple state-of-the-art LLMs via zero-shot prompting. Our empirical analysis reveals— for the first time—that LLMs’ knowledge representation is biased toward factual recall rather than relational reasoning: they reliably encode “how many” but fail to consistently infer “which rank.” This finding indicates a fundamental divergence between LLMs’ internal knowledge organization and human-like structured reasoning, challenging the implicit assumption that LLMs serve as general-purpose reasoning engines. To support reproducible evaluation of structured reasoning capabilities, we publicly release all code, data, and model outputs, establishing a benchmark for assessing relational inference in foundation models.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have become a dominant approach in natural language processing, yet their internal knowledge structures remain largely unexplored. In this paper, we analyze the internal knowledge structures of LLMs using historical medal tallies from the Olympic Games. We task the models with providing the medal counts for each team and identifying which teams achieved specific rankings. Our results reveal that while state-of-the-art LLMs perform remarkably well in reporting medal counts for individual teams, they struggle significantly with questions about specific rankings. This suggests that the internal knowledge structures of LLMs are fundamentally different from those of humans, who can easily infer rankings from known medal counts. To support further research, we publicly release our code, dataset, and model outputs.

Problem

Research questions and friction points this paper is trying to address.

Understanding LLMs' internal knowledge structures

Evaluating LLMs on Olympic medal retrieval tasks

Identifying limitations in LLMs' knowledge integration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluating LLMs using Olympic medal data

Comparing medal count retrieval and ranking accuracy

Releasing code and dataset for further research

🔎 Similar Papers

No similar papers found.