Knowledge Boundary of Large Language Models: A Survey

📅 2024-12-17
🏛️ arXiv.org
📈 Citations: 8
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) possess vast knowledge but suffer from ill-defined knowledge boundaries and hallucination-prone knowledge retrieval. Existing work lacks a formal, systematic characterization of these boundaries. This paper introduces the first formal definition of LLM knowledge boundaries and proposes the first taxonomy of LLM knowledge—categorizing knowledge into factual, procedural, context-dependent, and time-sensitive types. We further develop an integrated “motivation–identification–mitigation” analytical framework. Leveraging knowledge taxonomy modeling, systematic literature review, and cross-methodological synthesis, we identify six open challenges. Our work unifies fragmented research paradigms and establishes a theoretical foundation and practical guidance for designing knowledge-aware LLMs.

Technology Category

Application Category

📝 Abstract
Although large language models (LLMs) store vast amount of knowledge in their parameters, they still have limitations in the memorization and utilization of certain knowledge, leading to undesired behaviors such as generating untruthful and inaccurate responses. This highlights the critical need to understand the knowledge boundary of LLMs, a concept that remains inadequately defined in existing research. In this survey, we propose a comprehensive definition of the LLM knowledge boundary and introduce a formalized taxonomy categorizing knowledge into four distinct types. Using this foundation, we systematically review the field through three key lenses: the motivation for studying LLM knowledge boundaries, methods for identifying these boundaries, and strategies for mitigating the challenges they present. Finally, we discuss open challenges and potential research directions in this area. We aim for this survey to offer the community a comprehensive overview, facilitate access to key issues, and inspire further advancements in LLM knowledge research.
Problem

Research questions and friction points this paper is trying to address.

Define knowledge boundary of LLMs
Identify limitations in LLM knowledge utilization
Propose mitigation strategies for knowledge boundary challenges
Innovation

Methods, ideas, or system contributions that make the work stand out.

Define LLM knowledge boundary comprehensively
Categorize knowledge into four distinct types
Review motivation, identification, and mitigation strategies
🔎 Similar Papers
Moxin Li
Moxin Li
National University of Singapore
natural language processing
Y
Yong Zhao
National University of Singapore
Y
Yang Deng
Singapore Management University
W
Wenxuan Zhang
S
Shuaiyi Li
The Chinese University of Hong Kong
W
Wenya Xie
The Chinese University of Hong Kong, Shenzhen
See-Kiong Ng
See-Kiong Ng
School of Computing and Institute of Data Science, National University of Singapore
artificial intelligencenatural language processingdata miningsmart citiesbioinformatics
T
Tat-Seng Chua
National University of Singapore