A Dynamic Self-Evolving Extraction System

📅 2026-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes DySECT, the first dynamic information extraction system that enables co-evolution of knowledge and extraction to address challenges such as the dynamic evolution of domain-specific terminology, lagging expert taxonomies, and difficulties in recognizing rare terms. DySECT continuously constructs a knowledge base by extracting triples using large language models, integrates probabilistic knowledge representation with graph-based reasoning to support autonomous knowledge expansion, and enhances the extraction model through prompt tuning, few-shot learning, or fine-tuning on synthetic data—establishing a closed-loop “extraction–knowledge” reinforcement mechanism. Experiments in dynamic domains including healthcare, legal, and human resources demonstrate that DySECT significantly improves both the accuracy and timeliness of information extraction, achieving continuous self-optimization of system capabilities.

Technology Category

Application Category

📝 Abstract
The extraction of structured information from raw text is a fundamental component of many NLP applications, including document retrieval, ranking, and relevance estimation. High-quality extractions often require domain-specific accuracy, up-to-date understanding of specialized taxonomies, and the ability to incorporate emerging jargon and rare outliers. In many domains--such as medical, legal, and HR--the extraction model must also adapt to shifting terminology and benefit from explicit reasoning over structured knowledge. We propose DySECT, a Dynamic Self-Evolving Extraction and Curation Toolkit, which continually improves as it is used. The system incrementally populates a versatile, self-expanding knowledge base (KB) with triples extracted by the LLM. The KB further enriches itself through the integration of probabilistic knowledge and graph-based reasoning, gradually accumulating domain concepts and relationships. The enriched KB then feeds back into the LLM extractor via prompt tuning, sampling of relevant few-shot examples, or fine-tuning using KB-derived synthetic data. As a result, the system forms a symbiotic closed-loop cycle in which extraction continuously improves knowledge, and knowledge continuously improves extraction.
Problem

Research questions and friction points this paper is trying to address.

information extraction
dynamic adaptation
domain-specific terminology
knowledge base evolution
structured text understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Self-Evolving System
Knowledge Base Enrichment
LLM-Powered Information Extraction
Graph-Based Reasoning
Closed-Loop Learning
🔎 Similar Papers
No similar papers found.