Relation Extraction Across Entire Books to Reconstruct Community Networks: The AffilKG Datasets

📅 2025-05-16

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Existing knowledge graph (KG) extraction results lack credibility assessments tailored to social science research, primarily due to the small scale, poor connectivity, or structural complexity of mainstream annotated datasets. Method: We introduce AffilKG—the first large-scale, book-level annotated KG dataset focused on person–institution affiliation relations, covering six full-book annotations and three multi-relation enhanced subsets. Leveraging OCR-derived text, we perform entity recognition, relation extraction, and lightweight affiliation graph annotation. Contribution/Results: AffilKG enables, for the first time, quantitative evaluation of how KG extraction errors propagate to downstream graph-level analytical tasks—e.g., community detection. Experiments reveal substantial performance fluctuations of state-of-the-art models across books, validating AffilKG’s utility for robust KG evaluation and trustworthy social network analysis.

Technology Category

Application Category

📝 Abstract

When knowledge graphs (KGs) are automatically extracted from text, are they accurate enough for downstream analysis? Unfortunately, current annotated datasets can not be used to evaluate this question, since their KGs are highly disconnected, too small, or overly complex. To address this gap, we introduce AffilKG (https://doi.org/10.5281/zenodo.15427977), which is a collection of six datasets that are the first to pair complete book scans with large, labeled knowledge graphs. Each dataset features affiliation graphs, which are simple KGs that capture Member relationships between Person and Organization entities -- useful in studies of migration, community interactions, and other social phenomena. In addition, three datasets include expanded KGs with a wider variety of relation types. Our preliminary experiments demonstrate significant variability in model performance across datasets, underscoring AffilKG's ability to enable two critical advances: (1) benchmarking how extraction errors propagate to graph-level analyses (e.g., community structure), and (2) validating KG extraction methods for real-world social science research.

Problem

Research questions and friction points this paper is trying to address.

Evaluate accuracy of knowledge graphs for downstream analysis

Address lack of suitable annotated datasets for KG evaluation

Enable benchmarking KG extraction errors in social science research

Innovation

Methods, ideas, or system contributions that make the work stand out.

AffilKG datasets pair books with labeled KGs

Simple KGs capture Member-Organization relationships

Expanded KGs include diverse relation types

🔎 Similar Papers

Examining Different Research Communities: Authorship Network