Combining Distantly Supervised Models with In Context Learning for Monolingual and Cross-Lingual Relation Extraction

📅 2025-10-21

📈 Citations: 0

✨ Influential: 0

career value

143K/year

🤖 AI Summary

Distant supervision relation extraction (DSRE) suffers from two core challenges: label noise and the misalignment between sentence-level predictions and bag-level supervision; moreover, existing approaches fail to effectively integrate large language models’ (LLMs) in-context learning (ICL) capabilities—especially in cross-lingual, low-resource settings. This paper proposes HYDRE, the first framework to deeply synergize DSRE with ICL. It introduces a dynamic exemplar retrieval mechanism that explicitly models relational semantics and mitigates label noise. HYDRE supports both monolingual and cross-lingual transfer: it achieves a +20 F1 gain on English benchmarks and an average +17 F1 improvement across four low-resource Indian languages—substantially outperforming state-of-the-art methods. Key contributions include: (i) a noise-robust joint modeling paradigm unifying DSRE and ICL; (ii) a generalizable dynamic exemplar retrieval strategy; and (iii) the first cross-lingual DSRE extension specifically designed for low-resource languages.

Technology Category

Application Category

📝 Abstract

Distantly Supervised Relation Extraction (DSRE) remains a long-standing challenge in NLP, where models must learn from noisy bag-level annotations while making sentence-level predictions. While existing state-of-the-art (SoTA) DSRE models rely on task-specific training, their integration with in-context learning (ICL) using large language models (LLMs) remains underexplored. A key challenge is that the LLM may not learn relation semantics correctly, due to noisy annotation. In response, we propose HYDRE -- HYbrid Distantly Supervised Relation Extraction framework. It first uses a trained DSRE model to identify the top-k candidate relations for a given test sentence, then uses a novel dynamic exemplar retrieval strategy that extracts reliable, sentence-level exemplars from training data, which are then provided in LLM prompt for outputting the final relation(s). We further extend HYDRE to cross-lingual settings for RE in low-resource languages. Using available English DSRE training data, we evaluate all methods on English as well as a newly curated benchmark covering four diverse low-resource Indic languages -- Oriya, Santali, Manipuri, and Tulu. HYDRE achieves up to 20 F1 point gains in English and, on average, 17 F1 points on Indic languages over prior SoTA DSRE models. Detailed ablations exhibit HYDRE's efficacy compared to other prompting strategies.

Problem

Research questions and friction points this paper is trying to address.

Addressing noisy distant supervision in relation extraction

Integrating in-context learning with distantly supervised models

Extending relation extraction to low-resource cross-lingual settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines DSRE model with in-context learning

Uses dynamic exemplar retrieval from training data

Extends framework to cross-lingual low-resource settings

🔎 Similar Papers

AMR-RE: Abstract Meaning Representations for Retrieval-Based In-Context Learning in Relation Extraction