Explicit vs. Implicit Biographies: Evaluating and Adapting LLM Information Extraction on Wikidata-Derived Texts

📅 2025-09-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of implicit semantic reasoning in information extraction (IE). We systematically investigate the performance disparity of large language models (LLMs) on explicit versus implicit biographical texts. To this end, we introduce the first 10k-scale multimodal biographical dataset with explicit/implicit relation annotations, revealing substantial performance degradation—up to 30% F1 drop—across mainstream LLMs (LLaMA-2.3, DeepSeek-V1, Phi-1.5) on implicit relation identification. To bridge this gap, we propose a lightweight low-rank adaptation (LoRA)-based fine-tuning framework specifically designed for modeling implicit relational semantics, enhancing both generalization and interpretability. Extensive experiments demonstrate that LoRA fine-tuning yields an average 12.7% F1 improvement over base LLMs and significantly boosts zero-shot inference capability on unseen implicit patterns. Our results empirically validate the effectiveness and scalability of parameter-efficient adaptation for implicit semantic understanding in IE.

Technology Category

Application Category

📝 Abstract
Text Implicitness has always been challenging in Natural Language Processing (NLP), with traditional methods relying on explicit statements to identify entities and their relationships. From the sentence "Zuhdi attends church every Sunday", the relationship between Zuhdi and Christianity is evident for a human reader, but it presents a challenge when it must be inferred automatically. Large language models (LLMs) have proven effective in NLP downstream tasks such as text comprehension and information extraction (IE). This study examines how textual implicitness affects IE tasks in pre-trained LLMs: LLaMA 2.3, DeepSeekV1, and Phi1.5. We generate two synthetic datasets of 10k implicit and explicit verbalization of biographic information to measure the impact on LLM performance and analyze whether fine-tuning implicit data improves their ability to generalize in implicit reasoning tasks. This research presents an experiment on the internal reasoning processes of LLMs in IE, particularly in dealing with implicit and explicit contexts. The results demonstrate that fine-tuning LLM models with LoRA (low-rank adaptation) improves their performance in extracting information from implicit texts, contributing to better model interpretability and reliability.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLM performance on implicit vs explicit information extraction
Analyzing impact of fine-tuning on implicit reasoning generalization
Improving model interpretability in handling textual implicitness challenges
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning LLMs with LoRA adaptation
Synthetic datasets for implicit/explicit verbalization
Improving information extraction from implicit texts
🔎 Similar Papers
No similar papers found.
A
Alessandra Stramiglio
DISI Department of Computer Science and Engineering, University of Bologna, Italy
A
Andrea Schimmenti
DISI Department of Computer Science and Engineering, University of Bologna, Italy
V
Valentina Pasqual
Digital Humanities Advanced Research Center (/DH.arc), Department of Classical Philology and Italian Studies, University of Bologna, Italy
Marieke van Erp
Marieke van Erp
KNAW Humanities Cluster
text miningcultural heritagesemantic webcultural aidigital humanities
Francesco Sovrano
Francesco Sovrano
ETH Zurich, Collegium Helveticum
AI for Software EngineeringResponsible AIAI and LawXAITheory of Explanations
Fabio Vitali
Fabio Vitali
Professor of Computer Science, University of Bologna
Markup languagessemantic webweb technologiesfolksonomiesversioning