Kastor: Fine-Tuned Small Language Models for Shape-Based Active Relation Extraction

📅 2025-11-05

🏛️ Extended Semantic Web Conference

📈 Citations: 1

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work addresses the low accuracy and poor generalization of relation extraction in domain-specific knowledge bases. We propose a fine-tuning method for small language models (SLMs) guided jointly by RDF schemas and SHACL shapes. Methodologically, we extend conventional single-attribute SHACL validation to multi-attribute compositional constraint modeling and integrate an iterative active learning mechanism, enabling fine-grained relation identification and novel fact discovery under limited annotated text and RDF data. Our key contributions are: (1) the first deep integration of SHACL shape semantics into SLM fine-tuning, endowing the model with schema-aware reasoning capabilities; and (2) enhanced noise robustness and generalization via attribute-coordinated evaluation. Experiments on knowledge base completion demonstrate that our approach significantly outperforms baseline models, achieving a 12.7% absolute improvement in relation extraction F1-score and effectively discovering domain-specific novel relations.

Technology Category

Application Category

📝 Abstract

RDF pattern-based extraction is a compelling approach for fine-tuning small language models (SLMs) by focusing a relation extraction task on a specified SHACL shape. This technique enables the development of efficient models trained on limited text and RDF data. In this article, we introduce Kastor, a framework that advances this approach to meet the demands for completing and refining knowledge bases in specialized domains. Kastor reformulates the traditional validation task, shifting from single SHACL shape validation to evaluating all possible combinations of properties derived from the shape. By selecting the optimal combination for each training example, the framework significantly enhances model generalization and performance. Additionally, Kastor employs an iterative learning process to refine noisy knowledge bases, enabling the creation of robust models capable of uncovering new, relevant facts

Problem

Research questions and friction points this paper is trying to address.

Fine-tuning small language models for shape-based relation extraction tasks

Enhancing model generalization by evaluating all property combinations from SHACL shapes

Refining noisy knowledge bases through iterative learning to discover new facts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned small language models for relation extraction

Evaluates all property combinations from SHACL shapes

Uses iterative learning to refine noisy knowledge bases

🔎 Similar Papers

DiVA-DocRE: A Discriminative and Voice-Aware Paradigm for Document-Level Relation Extraction