Preserving Privacy Without Compromising Accuracy: Machine Unlearning for Handwritten Text Recognition

📅 2025-04-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses privacy leakage risks and “right-to-be-forgotten” compliance requirements in handwritten text recognition (HTR). It introduces machine unlearning to HTR for the first time. We propose a two-stage unlearning strategy: leveraging the writer classification head in a multi-head Transformer as an unlearning indicator, combined with structured pruning and random label injection, to selectively erase sensitive handwriting-style information. On multiple HTR benchmarks, our method achieves over 98% unlearning success rate while degrading character recognition accuracy by less than 0.5%, significantly outperforming baselines. Moreover, membership inference attack success drops by over 40%, confirming robust privacy protection. The approach simultaneously satisfies legal compliance (e.g., GDPR’s right to erasure) and preserves model utility, establishing a practical, privacy-enhancing paradigm for real-world HTR systems.

Technology Category

Application Category

📝 Abstract
Handwritten Text Recognition (HTR) is essential for document analysis and digitization. However, handwritten data often contains user-identifiable information, such as unique handwriting styles and personal lexicon choices, which can compromise privacy and erode trust in AI services. Legislation like the ``right to be forgotten'' underscores the necessity for methods that can expunge sensitive information from trained models. Machine unlearning addresses this by selectively removing specific data from models without necessitating complete retraining. Yet, it frequently encounters a privacy-accuracy tradeoff, where safeguarding privacy leads to diminished model performance. In this paper, we introduce a novel two-stage unlearning strategy for a multi-head transformer-based HTR model, integrating pruning and random labeling. Our proposed method utilizes a writer classification head both as an indicator and a trigger for unlearning, while maintaining the efficacy of the recognition head. To our knowledge, this represents the first comprehensive exploration of machine unlearning within HTR tasks. We further employ Membership Inference Attacks (MIA) to evaluate the effectiveness of unlearning user-identifiable information. Extensive experiments demonstrate that our approach effectively preserves privacy while maintaining model accuracy, paving the way for new research directions in the document analysis community. Our code will be publicly available upon acceptance.
Problem

Research questions and friction points this paper is trying to address.

Machine unlearning for privacy in handwritten text recognition
Balancing privacy and accuracy in AI models
Removing user-identifiable data without full retraining
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage unlearning with pruning and random labeling
Writer classification head as unlearning trigger
Membership Inference Attacks for privacy evaluation
🔎 Similar Papers
No similar papers found.
L
Lei Kang
Computer Vision Center, Universitat Autònoma de Barcelona, Barcelona, Spain
Xuanshuo Fu
Xuanshuo Fu
Autonomous University of Barcelona
Theoretical Computer Science、Machine Learning、Machine Learning
L
Lluís Gómez
Computer Vision Center, Universitat Autònoma de Barcelona, Barcelona, Spain
A
Alicia Forn'es
Computer Vision Center, Universitat Autònoma de Barcelona, Barcelona, Spain
Ernest Valveny
Ernest Valveny
Computer Vision Center - Universitat Autònoma de Barcelona
Dimosthenis Karatzas
Dimosthenis Karatzas
Computer Vision Center, Universitat Autónoma de Barcelona
computer visiondocument analysisvision and languagereading systems