🤖 AI Summary
This study addresses the challenges of historical document retrieval—such as linguistic evolution, terminological shifts, and cultural biases—that contribute to inequitable access to digital archives. To bridge this gap, the work integrates information retrieval with cultural analysis by constructing the first inclusive evaluation benchmark for 19th-century English texts, based on the British Library’s BL19 collection. Leveraging expert-crafted queries, paragraph-level relevance annotations, and collaboration with large language models, the project introduces a cross-genre knowledge transfer mechanism that adapts the narrative understanding and semantic richness found in fiction to enhance retrieval in non-fiction scholarly documents. This approach significantly improves retrieval accuracy and establishes a historically aware evaluation paradigm for information retrieval that prioritizes interpretability, transparency, and cultural inclusivity, thereby advancing more equitable and emancipatory knowledge infrastructures.
📝 Abstract
This work bridges the fields of information retrieval and cultural analytics to support equitable access to historical knowledge. Using the British Library BL19 digital collection (more than 35,000 works from 1700-1899), we construct a benchmark for studying changes in language, terminology and retrieval in the 19th-century fiction and non-fiction. Our approach combines expert-driven query design, paragraph-level relevance annotation, and Large Language Model (LLM) assistance to create a scalable evaluation framework grounded in human expertise. We focus on knowledge transfer from fiction to non-fiction, investigating how narrative understanding and semantic richness in fiction can improve retrieval for scholarly and factual materials. This interdisciplinary framework not only improves retrieval accuracy but also fosters interpretability, transparency, and cultural inclusivity in digital archives. Our work provides both practical evaluation resources and a methodological paradigm for developing retrieval systems that support richer, historically aware engagement with digital archives, ultimately working towards more emancipatory knowledge infrastructures.