DocTTT: Test-Time Training for Handwritten Document Recognition Using Meta-Auxiliary Learning

📅 2025-01-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Handwritten document recognition suffers from poor generalization under complex backgrounds, diverse handwriting styles, and layout variations, compounded by severe scarcity of labeled data. To address this, we propose a test-time adaptation framework that jointly integrates meta-learning (MAML-style) with masked autoencoding (MAE) into an end-to-end optimized test-time training (TTT) mechanism. Crucially, our method enables real-time visual representation adaptation from a single unlabeled test sample—eliminating reliance on large-scale annotated datasets or conventional fine-tuning. During inference, it dynamically updates model parameters, substantially enhancing robustness and generalization in few-shot settings. Evaluated on multiple standard handwritten document benchmarks, our approach surpasses current state-of-the-art methods, achieving absolute accuracy gains of 5.2–8.7 percentage points in few-shot recognition.

Technology Category

Application Category

📝 Abstract
Despite recent significant advancements in Handwritten Document Recognition (HDR), the efficient and accurate recognition of text against complex backgrounds, diverse handwriting styles, and varying document layouts remains a practical challenge. Moreover, this issue is seldom addressed in academic research, particularly in scenarios with minimal annotated data available. In this paper, we introduce the DocTTT framework to address these challenges. The key innovation of our approach is that it uses test-time training to adapt the model to each specific input during testing. We propose a novel Meta-Auxiliary learning approach that combines Meta-learning and self-supervised Masked Autoencoder~(MAE). During testing, we adapt the visual representation parameters using a self-supervised MAE loss. During training, we learn the model parameters using a meta-learning framework, so that the model parameters are learned to adapt to a new input effectively. Experimental results show that our proposed method significantly outperforms existing state-of-the-art approaches on benchmark datasets.
Problem

Research questions and friction points this paper is trying to address.

Handwritten Document Recognition
Complex Backgrounds
Limited Data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Meta-Learning
Self-Supervised Masked Autoencoder
Personalized Adaptation
W
Wenhao Gu
Department of Computer Science and Software Engineering, Concordia University
L
Li Gu
Department of Computer Science and Software Engineering, Concordia University
Ziqiang Wang
Ziqiang Wang
Concordia University
Computer Vision
C
Ching Yee Suen
Department of Computer Science and Software Engineering, Concordia University
Y
Yang Wang
Department of Computer Science and Software Engineering, Concordia University