🤖 AI Summary
This study addresses the challenge of cross-version article matching and evolutionary analysis between two editions of the historical encyclopedia *Nordisk familjebok* (1876–1899 vs. 1904–1926). We propose the first end-to-end framework integrating semantic sentence embedding for article matching, Transformer-based geographic entity recognition, and Wikidata entity linking—enabling fine-grained article alignment and spatiotemporal semantic tracking. Our method significantly improves cross-version article linking accuracy. Crucially, it reveals, for the first time, a structural shift in geographical coverage: while Europe’s representation declines in the second edition, North America, Africa, Asia, Australia, and Northern Europe exhibit marked increases—reflecting geopolitical realignments and transformations in knowledge production before and after World War I. This work establishes a reusable methodological paradigm for longitudinal comparative analysis of historical texts in digital humanities.
📝 Abstract
The extit{Nordisk familjebok} is a Swedish encyclopedia from the 19th and 20th centuries. It was written by a team of experts and aimed to be an intellectual reference, stressing precision and accuracy. This encyclopedia had four main editions remarkable by their size, ranging from 20 to 38 volumes. As a consequence, the extit{Nordisk familjebok} had a considerable influence in universities, schools, the media, and society overall. As new editions were released, the selection of entries and their content evolved, reflecting intellectual changes in Sweden.
In this paper, we used digitized versions from extit{Project Runeberg}. We first resegmented the raw text into entries and matched pairs of entries between the first and second editions using semantic sentence embeddings. We then extracted the geographical entries from both editions using a transformer-based classifier and linked them to Wikidata. This enabled us to identify geographic trends and possible shifts between the first and second editions, written between 1876-1899 and 1904-1926, respectively.
Interpreting the results, we observe a small but significant shift in geographic focus away from Europe and towards North America, Africa, Asia, Australia, and northern Scandinavia from the first to the second edition, confirming the influence of the First World War and the rise of new powers. The code and data are available on GitHub at https://github.com/sibbo/nordisk-familjebok.