Detecting Latin in Historical Books with Large Language Models: A Multimodal Benchmark

📅 2025-10-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the previously undefined task of automatic Latin-script fragment detection in multilingual, typographically complex historical documents. We introduce the first multimodal annotated dataset (724 pages) specifically designed for this task, incorporating textual content, layout structure, and language annotations. Methodologically, we propose a joint multimodal modeling framework that integrates large foundation models with heterogeneous features—namely OCR-extracted text and document layout representations—to jointly perform cross-modal language identification and spatial localization. Comprehensive experiments evaluate state-of-the-art large language and vision-language models on this task, demonstrating their reliable detection capability. Key contributions include: (1) the formal definition and task formulation of Latin-script fragment detection; (2) the release of the first dedicated benchmark dataset; and (3) an empirical analysis revealing both the promise and limitations of foundation models in real-world historical document digitization—thereby establishing a new paradigm for intelligent cultural heritage processing.

Technology Category

Application Category

📝 Abstract
This paper presents a novel task of extracting Latin fragments from mixed-language historical documents with varied layouts. We benchmark and evaluate the performance of large foundation models against a multimodal dataset of 724 annotated pages. The results demonstrate that reliable Latin detection with contemporary models is achievable. Our study provides the first comprehensive analysis of these models' capabilities and limits for this task.
Problem

Research questions and friction points this paper is trying to address.

Extracting Latin fragments from mixed-language historical documents
Evaluating large foundation models on multimodal annotated datasets
Assessing model capabilities and limits for Latin detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extracting Latin fragments from mixed-language historical documents
Benchmarking large foundation models on multimodal dataset
Providing first comprehensive analysis of model capabilities