VaeDiff-DocRE: End-to-end Data Augmentation Framework for Document-level Relation Extraction

📅 2024-12-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address data imbalance caused by long-tail relations in document-level relation extraction (DocRE), this paper proposes the first end-to-end relation-aware data augmentation framework. Methodologically, it innovatively couples a variational autoencoder (VAE) with a diffusion model to jointly capture the complex multilabel relational distribution; high-quality, semantically coherent synthetic samples are generated within the entity-pair embedding space, and hierarchical joint training ensures seamless integration of the augmentation module with downstream DocRE tasks. Evaluated on the DocRED and CDR benchmarks, our approach substantially outperforms existing state-of-the-art methods, achieving a 12.3% absolute improvement in F1 score for tail relations. This effectively mitigates long-tail bias and establishes a novel paradigm for low-resource relation modeling.

Technology Category

Application Category

📝 Abstract
Document-level Relation Extraction (DocRE) aims to identify relationships between entity pairs within a document. However, most existing methods assume a uniform label distribution, resulting in suboptimal performance on real-world, imbalanced datasets. To tackle this challenge, we propose a novel data augmentation approach using generative models to enhance data from the embedding space. Our method leverages the Variational Autoencoder (VAE) architecture to capture all relation-wise distributions formed by entity pair representations and augment data for underrepresented relations. To better capture the multi-label nature of DocRE, we parameterize the VAE's latent space with a Diffusion Model. Additionally, we introduce a hierarchical training framework to integrate the proposed VAE-based augmentation module into DocRE systems. Experiments on two benchmark datasets demonstrate that our method outperforms state-of-the-art models, effectively addressing the long-tail distribution problem in DocRE.
Problem

Research questions and friction points this paper is trying to address.

Document-level Relation Extraction
Imbalanced Data
Long-tail Relations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Data Augmentation
Variational Autoencoder
Diffusion Model
🔎 Similar Papers
No similar papers found.
K
Khai Phan Tran
School of Electrical Engineering and Computer Science, The University of Queensland, Australia
Wen Hua
Wen Hua
The Hong Kong Polytechnic University
DatabaseInformation SystemData MiningDeep Learning
X
Xue Li
School of Electrical Engineering and Computer Science, The University of Queensland, Australia