π€ AI Summary
Historical document text-line segmentation suffers from a severe scarcity of annotated training data due to high expert annotation costs and limited availability of labeled manuscripts. To address this, we propose a lightweight UNet++ architecture augmented with a neuron morphology-inspired connectivity-aware loss function, enabling precise modeling of text-line topology under an extremely low-data regimeβjust three annotated pages per manuscript. Our method employs patch-based training and aggressive data augmentation to enhance generalization. Evaluated on the U-DIADS-TL dataset, it achieves a 200% improvement in recognition accuracy, a 75% increase in line-level Intersection-over-Union (IoU), and an F-measure competitive with top-performing systems in the DIVA-HisDB competition. The core contribution is the first integration of connectivity-aware loss into few-shot text-line segmentation, yielding an end-to-end solution that attains high accuracy while drastically reducing annotation dependency.
π Abstract
A foundational task for the digital analysis of documents is text line segmentation. However, automating this process with deep learning models is challenging because it requires large, annotated datasets that are often unavailable for historical documents. Additionally, the annotation process is a labor- and cost-intensive task that requires expert knowledge, which makes few-shot learning a promising direction for reducing data requirements. In this work, we demonstrate that small and simple architectures, coupled with a topology-aware loss function, are more accurate and data-efficient than more complex alternatives. We pair a lightweight UNet++ with a connectivity-aware loss, initially developed for neuron morphology, which explicitly penalizes structural errors like line fragmentation and unintended line merges. To increase our limited data, we train on small patches extracted from a mere three annotated pages per manuscript. Our methodology significantly improves upon the current state-of-the-art on the U-DIADS-TL dataset, with a 200% increase in Recognition Accuracy and a 75% increase in Line Intersection over Union. Our method also achieves an F-Measure score on par with or even exceeding that of the competition winner of the DIVA-HisDB baseline detection task, all while requiring only three annotated pages, exemplifying the efficacy of our approach. Our implementation is publicly available at: https://github.com/RafaelSterzinger/acpr_few_shot_hist.