Quotegraph: A Social Network Extracted from Millions of News Quotations

📅 2025-07-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses a critical data gap in computational social science: the lack of large-scale, dynamically evolving social networks grounded in real-world discourse. We propose a novel paradigm for automatically constructing such networks from news quotations. Leveraging the Quotebank corpus (2008–2020 English news), we design a language-agnostic pipeline comprising speaker identification, quotation detection, and Wikidata entity linking—enabling the first large-scale, quotation-driven relational extraction. The resulting Quotegraph dataset comprises 528,000 nodes (public figures) and 8.63 million directed edges (quotation-based associations), with each node annotated with structured attributes including nationality, gender, and political affiliation, alongside preserved contextual metadata. Quotegraph fills a key void in empirically grounded, discourse-based dynamic network resources. Its design supports cross-lingual extension and multidimensional social network analysis, providing a reproducible, scalable foundation for research in political communication, opinion dynamics, and related domains.

Technology Category

Application Category

📝 Abstract
We introduce Quotegraph, a novel large-scale social network derived from speaker-attributed quotations in English news articles published between 2008 and 2020. Quotegraph consists of 528 thousand unique nodes and 8.63 million directed edges, pointing from speakers to persons they mention. The nodes are linked to their corresponding items in Wikidata, thereby endowing the dataset with detailed biographic entity information, including nationality, gender, and political affiliation. Being derived from Quotebank, a massive corpus of quotations, relations in Quotegraph are additionally enriched with the information about the context in which they are featured. Each part of the network construction pipeline is language agnostic, enabling the construction of similar datasets based on non-English news corpora. We believe Quotegraph is a compelling resource for computational social scientists, complementary to online social networks, with the potential to yield novel insights into the behavior of public figures and how it is captured in the news.
Problem

Research questions and friction points this paper is trying to address.

Constructing a large-scale social network from news quotations
Enriching speaker-node relations with Wikidata biographical data
Enabling multilingual network construction from non-English news
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale social network from news quotations
Nodes linked to Wikidata for entity details
Language-agnostic pipeline for non-English corpora
🔎 Similar Papers
No similar papers found.