Learning to Decipher from Pixels -- A Case Study of Copiale

📅 2026-04-26
📈 Citations: 0
Influential: 0
📄 PDF

career value

185K/year
🤖 AI Summary
This work addresses the inefficiency and error-proneness of traditional two-stage approaches to deciphering historical cipher manuscripts, which first transcribe symbols and then decrypt them—diverting from the ultimate goal of directly recovering plaintext. To overcome this limitation, we propose the first end-to-end decryption framework that maps encrypted manuscript images directly to plaintext, bypassing intermediate transcription. We construct the first image-text aligned dataset specifically for the Copiale cipher and employ a strategy of pretraining on general handwritten text recognition models followed by fine-tuning on cipher-specific data. Experimental results demonstrate that our method significantly improves decryption accuracy, establishing the feasibility and effectiveness of end-to-end decryption for historical substitution ciphers.

Technology Category

Application Category

📝 Abstract
Historical encrypted manuscripts require both paleographic interpretation of cipher symbols and cryptanalytic recovery of plaintext. Most existing computational workflows rely on a transcription-first paradigm, in which handwritten symbols are transcribed prior to decipherment. This intermediate step is labor-intensive, error-prone, and not always aligned with the goal of direct plaintext recovery. We propose an end-to-end, transcription-free approach that directly maps handwritten cipher images to plaintext. Using the Copiale cipher as a case study, we introduce the first text-line-level dataset pairing cipher images with German plaintext. We show that pretraining on generic handwriting data followed by cipher-specific fine-tuning substantially improves decipherment accuracy. Our results demonstrate that transcription-free image-to-plaintext decipherment is both feasible and effective for historical substitution ciphers, offering a simplified and scalable alternative to traditional pipelines. https://github.com/leitro/Decipher-from-Pixels-Copiale
Problem

Research questions and friction points this paper is trying to address.

historical encrypted manuscripts
cipher decipherment
transcription-free
handwritten cipher images
plaintext recovery
Innovation

Methods, ideas, or system contributions that make the work stand out.

transcription-free decipherment
end-to-end image-to-plaintext
historical cipher
handwritten manuscript analysis
Copiale cipher