Matching Meaning at Scale: Evaluating Semantic Search for 18th-Century Intellectual History through the Case of Locke

📅 2026-05-09

📈 Citations: 0

✨ Influential: 0

career value

144K/year

🤖 AI Summary

This study addresses the challenge of detecting implicit references to and paraphrases of John Locke’s works in eighteenth-century intellectual history, where traditional lexical reuse methods often fall short. It presents the first systematic evaluation of off-the-shelf semantic search pipelines for identifying latent receptions of Locke’s ideas in large-scale historical corpora. The assessment integrates an expert-constructed semantic taxonomy, lexical overlap diagnostics, and comparative baseline approaches. Results demonstrate that semantic search substantially outperforms purely lexical methods by uncovering deeper semantic associations; however, its efficacy remains constrained by a “lexical gatekeeping” effect—its reliance on some degree of surface-level lexical overlap. The findings illuminate both the promise and limitations of current semantic technologies in tracing the dissemination of early modern philosophical thought.

📝 Abstract

While digitized corpora have transformed the study of intellectual transmission, current methods rely heavily on lexical text reuse detection, capturing verbatim quotations but fundamentally missing paraphrases and complex implicit engagement. This paper evaluates semantic search in 18th-century intellectual history through the reception of John Locke's foundational work. Using expert annotation grounded in a semantic taxonomy, we examine whether an off-the-shelf semantic search pipeline can surface meaning-level correspondences overlooked by lexical methods. Our results demonstrate that semantic search retrieves substantially more implicit receptions than lexical baselines. However, linguistic diagnostics also reveal a "lexical gatekeeping" effect, where retrieval remains partially constrained by surface vocabulary overlap. These findings highlight both the potential and the limitations of semantic retrieval for analyzing the circulation of ideas in large historical corpora. The data is available at https://github.com/COMHIS/locke-sim-data.

Problem

Research questions and friction points this paper is trying to address.

semantic search

intellectual history

text reuse

paraphrase

idea transmission

Innovation

Methods, ideas, or system contributions that make the work stand out.

semantic search

intellectual history

lexical reuse