ClapperText: A Benchmark for Text Recognition in Low-Resource Archival Documents

📅 2025-10-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Recognizing handwritten and printed text—particularly slate metadata—in low-quality, resource-scarce archival videos from World War II remains highly challenging due to severe degradation, occlusion, and handwriting variability. Method: We introduce the first fine-grained OCR benchmark dataset specifically designed for degraded historical documents: it comprises 127 slate-containing video clips (9,813 frames, 94,573 word instances), annotated with rotated quadrilaterals and provided in both full-frame and cropped word-image formats. We propose a novel annotation protocol explicitly modeling handwriting variants, occlusions, and extreme visual degradation. Contribution/Results: With only 18 videos (~1.4% of the data) used for fine-tuning, six recognition and seven detection models achieve significant performance gains, demonstrating the dataset’s effectiveness and robustness for few-shot OCR research. This work fills a critical gap in OCR benchmarks for low-resource historical document analysis.

Technology Category

Application Category

📝 Abstract
This paper presents ClapperText, a benchmark dataset for handwritten and printed text recognition in visually degraded and low-resource settings. The dataset is derived from 127 World War II-era archival video segments containing clapperboards that record structured production metadata such as date, location, and camera-operator identity. ClapperText includes 9,813 annotated frames and 94,573 word-level text instances, 67% of which are handwritten and 1,566 are partially occluded. Each instance includes transcription, semantic category, text type, and occlusion status, with annotations available as rotated bounding boxes represented as 4-point polygons to support spatially precise OCR applications. Recognizing clapperboard text poses significant challenges, including motion blur, handwriting variation, exposure fluctuations, and cluttered backgrounds, mirroring broader challenges in historical document analysis where structured content appears in degraded, non-standard forms. We provide both full-frame annotations and cropped word images to support downstream tasks. Using a consistent per-video evaluation protocol, we benchmark six representative recognition and seven detection models under zero-shot and fine-tuned conditions. Despite the small training set (18 videos), fine-tuning leads to substantial performance gains, highlighting ClapperText's suitability for few-shot learning scenarios. The dataset offers a realistic and culturally grounded resource for advancing robust OCR and document understanding in low-resource archival contexts. The dataset and evaluation code are available at https://github.com/linty5/ClapperText.
Problem

Research questions and friction points this paper is trying to address.

Benchmarks text recognition in degraded low-resource archival documents
Addresses challenges of handwriting variation and partial occlusion
Provides dataset for few-shot learning in historical document analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dataset with annotated historical clapperboard frames
Rotated bounding boxes for precise OCR applications
Benchmarking models in few-shot learning scenarios