Do LLMs Dream of Ontologies?

📅 2024-01-26
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates large language models’ (LLMs) capacity to memorize and generalize structured knowledge—specifically concept ID–label mappings—from open ontologies, and its dependence on web popularity. We systematically evaluate ID retrieval accuracy across multi-source ontologies (GO, Uberon, Wikidata, ICD-10) for Pythia, GPT-series, and Gemini models. We introduce *prediction invariance*—a novel robustness metric quantifying model stability under prompt perturbations, temperature variation, and cross-model transfer. Integrating web frequency statistics, we find a significant positive correlation between ID retrieval accuracy and concept-level webpage occurrence counts. Results indicate that LLMs acquire ontology knowledge predominantly through unstructured textual exposure rather than direct ingestion of structured resources; GPT-4 achieves the highest accuracy, yet overall performance remains limited. This work is the first to uncover a popularity-driven mechanism underlying ontology knowledge internalization in LLMs and establishes a new paradigm for knowledge traceability and trustworthy model evaluation.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have demonstrated remarkable performance across diverse natural language processing tasks, yet their ability to memorize structured knowledge remains underexplored. In this paper, we investigate the extent to which general-purpose pre-trained LLMs retain and correctly reproduce concept identifier (ID)-label associations from publicly available ontologies. We conduct a systematic evaluation across multiple ontological resources, including the Gene Ontology, Uberon, Wikidata, and ICD-10, using LLMs such as Pythia-12B, Gemini-1.5-Flash, GPT-3.5, and GPT-4. Our findings reveal that only a small fraction of ontological concepts is accurately memorized, with GPT-4 demonstrating the highest performance. To understand why certain concepts are memorized more effectively than others, we analyze the relationship between memorization accuracy and concept popularity on the Web. Our results indicate a strong correlation between the frequency of a concept's occurrence online and the likelihood of accurately retrieving its ID from the label. This suggests that LLMs primarily acquire such knowledge through indirect textual exposure rather than directly from structured ontological resources. Furthermore, we introduce new metrics to quantify prediction invariance, demonstrating that the stability of model responses across variations in prompt language and temperature settings can serve as a proxy for estimating memorization robustness.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Structured Knowledge
Popularity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models
Structured Information Recall
Reliability of Memory
🔎 Similar Papers
No similar papers found.