VaeDiff-DocRE: End-to-end Data Augmentation Framework for Document-level Relation Extraction

📅 2024-12-18

📈 Citations: 0

✨ Influential: 0

career value

149K/year

🤖 AI Summary

To address data imbalance caused by long-tail relations in document-level relation extraction (DocRE), this paper proposes the first end-to-end relation-aware data augmentation framework. Methodologically, it innovatively couples a variational autoencoder (VAE) with a diffusion model to jointly capture the complex multilabel relational distribution; high-quality, semantically coherent synthetic samples are generated within the entity-pair embedding space, and hierarchical joint training ensures seamless integration of the augmentation module with downstream DocRE tasks. Evaluated on the DocRED and CDR benchmarks, our approach substantially outperforms existing state-of-the-art methods, achieving a 12.3% absolute improvement in F1 score for tail relations. This effectively mitigates long-tail bias and establishes a novel paradigm for low-resource relation modeling.

Technology Category

Application Category

📝 Abstract

Document-level Relation Extraction (DocRE) aims to identify relationships between entity pairs within a document. However, most existing methods assume a uniform label distribution, resulting in suboptimal performance on real-world, imbalanced datasets. To tackle this challenge, we propose a novel data augmentation approach using generative models to enhance data from the embedding space. Our method leverages the Variational Autoencoder (VAE) architecture to capture all relation-wise distributions formed by entity pair representations and augment data for underrepresented relations. To better capture the multi-label nature of DocRE, we parameterize the VAE's latent space with a Diffusion Model. Additionally, we introduce a hierarchical training framework to integrate the proposed VAE-based augmentation module into DocRE systems. Experiments on two benchmark datasets demonstrate that our method outperforms state-of-the-art models, effectively addressing the long-tail distribution problem in DocRE.

Problem

Research questions and friction points this paper is trying to address.

Document-level Relation Extraction

Imbalanced Data

Long-tail Relations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Data Augmentation

Variational Autoencoder

Diffusion Model

🔎 Similar Papers

DiVA-DocRE: A Discriminative and Voice-Aware Paradigm for Document-Level Relation Extraction