M-BRe: Discovering Training Samples for Relation Extraction from Unlabeled Texts with Large Language Models

πŸ“… 2025-09-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the high annotation cost, sparsity, and difficulty in discovering target relation instances in relation extraction, this paper proposes M-BReβ€”a zero-supervision framework for automatically mining high-quality training samples. First, it clusters relation types based on semantic similarity to mitigate LLMs’ insufficient semantic coverage in multi-class classification. Second, it designs a batch-wise binary classification mechanism per relation cluster to reduce LLM invocation overhead. Third, it introduces a cross-sample consistency-based label decision strategy to enhance annotation reliability. By synergistically integrating multi-class semantic modeling with the efficiency of binary classification, M-BRe achieves robust weak supervision without human-labeled data. Experiments on multiple benchmark datasets demonstrate that M-BRe significantly outperforms existing weakly supervised and LLM-prompting methods; relation extraction models trained on its generated data achieve average F1-score improvements of 3.2–5.7 percentage points.

Technology Category

Application Category

πŸ“ Abstract
For Relation Extraction (RE), the manual annotation of training data may be prohibitively expensive, since the sentences that contain the target relations in texts can be very scarce and difficult to find. It is therefore beneficial to develop an efficient method that can automatically extract training instances from unlabeled texts for training RE models. Recently, large language models (LLMs) have been adopted in various natural language processing tasks, with RE also benefiting from their advances. However, when leveraging LLMs for RE with predefined relation categories, two key challenges arise. First, in a multi-class classification setting, LLMs often struggle to comprehensively capture the semantics of every relation, leading to suboptimal results. Second, although employing binary classification for each relation individually can mitigate this issue, it introduces significant computational overhead, resulting in impractical time complexity for real-world applications. Therefore, this paper proposes a framework called M-BRe to extract training instances from unlabeled texts for RE. It utilizes three modules to combine the advantages of both of the above classification approaches: Relation Grouping, Relation Extraction, and Label Decision. Extensive experiments confirm its superior capability in discovering high-quality training samples from unlabeled texts for RE.
Problem

Research questions and friction points this paper is trying to address.

Automatically extract training instances from unlabeled texts
Overcome LLM limitations in multi-class relation classification
Reduce computational overhead of binary classification approaches
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines multi-class and binary classification advantages
Uses Relation Grouping, Extraction, and Label Decision modules
Automatically extracts training samples from unlabeled texts
πŸ”Ž Similar Papers
No similar papers found.
Z
Zexuan Li
College of Artificial Intelligence, Nanjing University of Aeronautics and Astronautics, Nanjing, China
Hongliang Dai
Hongliang Dai
Nanjing University of Aeronautics and Astronautics
Information ExtractionLLMsKnowledge Graph
P
Piji Li
College of Artificial Intelligence, Nanjing University of Aeronautics and Astronautics, Nanjing, China