Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations

📅 2025-04-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the underexplored problem of weak knowledge boundary identification and high hallucination risk in large language models (LLMs) for low-resource languages. We conduct the first systematic cross-lingual analysis of knowledge boundary cognition mechanisms. Our method introduces a training-free cross-lingual representation alignment technique to bridge semantic disparities across languages. We further construct KBench, the first multilingual knowledge boundary evaluation suite—comprising three distinct boundary sample categories—to rigorously assess boundary awareness. Using internal representation probing, we reveal that knowledge boundary signals are predominantly encoded in middle-to-upper transformer layers and exhibit strong cross-lingual transferability. Empirical results demonstrate that our approach significantly mitigates hallucinations in low-resource languages. All code and the multilingual evaluation dataset are publicly released.

Technology Category

Application Category

📝 Abstract
While understanding the knowledge boundaries of LLMs is crucial to prevent hallucination, research on knowledge boundaries of LLMs has predominantly focused on English. In this work, we present the first study to analyze how LLMs recognize knowledge boundaries across different languages by probing their internal representations when processing known and unknown questions in multiple languages. Our empirical studies reveal three key findings: 1) LLMs' perceptions of knowledge boundaries are encoded in the middle to middle-upper layers across different languages. 2) Language differences in knowledge boundary perception follow a linear structure, which motivates our proposal of a training-free alignment method that effectively transfers knowledge boundary perception ability across languages, thereby helping reduce hallucination risk in low-resource languages; 3) Fine-tuning on bilingual question pair translation further enhances LLMs' recognition of knowledge boundaries across languages. Given the absence of standard testbeds for cross-lingual knowledge boundary analysis, we construct a multilingual evaluation suite comprising three representative types of knowledge boundary data. Our code and datasets are publicly available at https://github.com/DAMO-NLP-SG/LLM-Multilingual-Knowledge-Boundaries.
Problem

Research questions and friction points this paper is trying to address.

Analyzing LLMs' knowledge boundaries across multiple languages
Transferring knowledge boundary perception to reduce hallucination
Enhancing cross-lingual boundary recognition via bilingual fine-tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Probing internal representations for multilingual knowledge boundaries
Training-free alignment method transfers boundary perception
Fine-tuning bilingual pairs enhances cross-lingual boundary recognition
🔎 Similar Papers
No similar papers found.
Chenghao Xiao
Chenghao Xiao
Durham University
Natural Language ProcessingInformation RetrievalRepresentation Learning
Hou Pong Chan
Hou Pong Chan
Language Technology Lab, Alibaba DAMO Academy
Natural Language GenerationNatural Language ProcessingMachine LearningData Mining
H
Hao Zhang
DAMO Academy, Alibaba Group
M
Mahani Aljunied
DAMO Academy, Alibaba Group
L
Li Bing
DAMO Academy, Alibaba Group
N
N. A. Moubayed
Department of Computer Science, Durham University
Y
Yu Rong
DAMO Academy, Alibaba Group