Knowledge Extraction on Semi-Structured Content: Does It Remain Relevant for Question Answering in the Era of LLMs?

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In the era of large language models (LLMs), it remains unclear whether knowledge extraction from semi-structured content retains value for question answering (QA). Method: This paper systematically investigates the feasibility and efficacy of integrating knowledge triple extraction with LLMs. We extend existing benchmarks with fine-grained triple annotations and propose a unified framework that jointly leverages knowledge extraction, context augmentation, and multi-task learning. Experiments are conducted across multiple commercial and open-source LLMs at varying scales. Contribution/Results: While large-scale triple extraction remains challenging, the extracted symbolic knowledge consistently improves LLM-based QA performance—especially under low-resource conditions. Our findings empirically validate the enduring complementary value of symbolic knowledge and neural models, offering a lightweight, efficient pathway for knowledge-enhanced LLMs.

Technology Category

Application Category

📝 Abstract
The advent of Large Language Models (LLMs) has significantly advanced web-based Question Answering (QA) systems over semi-structured content, raising questions about the continued utility of knowledge extraction for question answering. This paper investigates the value of triple extraction in this new paradigm by extending an existing benchmark with knowledge extraction annotations and evaluating commercial and open-source LLMs of varying sizes. Our results show that web-scale knowledge extraction remains a challenging task for LLMs. Despite achieving high QA accuracy, LLMs can still benefit from knowledge extraction, through augmentation with extracted triples and multi-task learning. These findings provide insights into the evolving role of knowledge triple extraction in web-based QA and highlight strategies for maximizing LLM effectiveness across different model sizes and resource settings.
Problem

Research questions and friction points this paper is trying to address.

Investigating knowledge extraction relevance for QA with LLMs
Evaluating triple extraction value in modern question answering
Assessing LLM performance on web-scale knowledge extraction tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Augmenting LLMs with extracted knowledge triples
Using multi-task learning to enhance QA performance
Evaluating triple extraction across varying model sizes
🔎 Similar Papers
No similar papers found.
K
Kai Sun
Meta Reality Labs, USA
Yin Huang
Yin Huang
Research Assistant, University of Florida
Multi-Armed BanditsEdge ComputingWireless CommunicationsQuantum Networking
S
Srishti Mehra
Meta Reality Labs, USA
M
Mohammad Kachuee
Meta Reality Labs, USA
Xilun Chen
Xilun Chen
Meta FAIR
Natural Language ProcessingMachine Learning
R
Renjie Tao
Meta Reality Labs, USA
Zhaojiang Lin
Zhaojiang Lin
Research Scientist, Facebook Reality Labs
Dialogue SystemNatural Language ProcessingDeep Learning
A
Andrea Jessee
Meta Reality Labs, USA
N
Nirav Shah
Meta Reality Labs, USA
A
Alex Betty
Meta Reality Labs, USA
Y
Yue Liu
Meta Reality Labs, USA
A
Anuj Kumar
Meta Reality Labs, USA
W
Wen-tau Yih
FAIR, Meta, USA
Xin Luna Dong
Xin Luna Dong
ACM / IEEE Fellow, Principal Scientist at Meta
Knowledge graphData qualityNLPSearch