A City of Millions: Mapping Literary Social Networks At Scale

πŸ“… 2025-02-26
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the challenge of automatically constructing high-quality social network data from multilingual historical narrative texts to support empirical analysis of historical social structures in the humanities and social sciences. Method: It introduces a novel approach that reformulates manual social network annotation as a structured large language model prompting task, integrating multilingual coreference resolution with a joint relation-type–intimacy labeling framework to enable consistent, cross-lingual, and cross-genre automated extraction. Contribution/Results: The project processes 70,509 literary and non-literary texts spanning 58 languages from 1800–1999, yielding a structured social network dataset comprising 1.19 million individuals and 2.8 million relational pairs, accompanied by 30,000 metadata records. To date, this constitutes the largest curated, structured social network resource in the humanities, substantially advancing historical social cognition modeling and digital humanities research.

Technology Category

Application Category

πŸ“ Abstract
We release 70,509 high-quality social networks extracted from multilingual fiction and nonfiction narratives. We additionally provide metadata for ~30,000 of these texts (73% nonfiction and 27% fiction) written between 1800 and 1999 in 58 languages. This dataset provides information on historical social worlds at an unprecedented scale, including data for 1,192,855 individuals in 2,805,482 pair-wise relationships annotated for affinity and relationship type. We achieve this scale by automating previously manual methods of extracting social networks; specifically, we adapt an existing annotation task as a language model prompt, ensuring consistency at scale with the use of structured output. This dataset provides an unprecedented resource for the humanities and social sciences by providing data on cognitive models of social realities.
Problem

Research questions and friction points this paper is trying to address.

Mapping large-scale literary social networks
Automating extraction of historical social data
Providing multilingual narrative social insights
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated social network extraction
Language model prompt adaptation
Structured output for consistency
πŸ”Ž Similar Papers
No similar papers found.