Historic Scripts to Modern Vision: A Novel Dataset and A VLM Framework for Transliteration of Modi Script to Devanagari

📅 2025-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the urgent need for digitizing endangered medieval Indian Modi-script Marathi manuscripts, this paper introduces MoScNet, the first end-to-end vision-language model for Modi-to-Devanagari script transcription. We present MoDeTrans, the first publicly available, paired benchmark dataset comprising 2,043 handwritten Modi images with accurate Devanagari transcriptions. Methodologically, MoScNet employs a hybrid CNN-Transformer encoder and a sequence-to-sequence decoder, augmented by OCR-specific pretraining and a novel knowledge distillation–driven lightweight architecture: the student model retains only 1/163 of the teacher’s parameters while surpassing its performance. Experiments demonstrate state-of-the-art accuracy on Modi transcription and competitive results on general OCR benchmarks. This work establishes a scalable, high-precision paradigm for historical manuscript digitization.

Technology Category

Application Category

📝 Abstract
In medieval India, the Marathi language was written using the Modi script. The texts written in Modi script include extensive knowledge about medieval sciences, medicines, land records and authentic evidence about Indian history. Around 40 million documents are in poor condition and have not yet been transliterated. Furthermore, only a few experts in this domain can transliterate this script into English or Devanagari. Most of the past research predominantly focuses on individual character recognition. A system that can transliterate Modi script documents to Devanagari script is needed. We propose the MoDeTrans dataset, comprising 2,043 images of Modi script documents accompanied by their corresponding textual transliterations in Devanagari. We further introduce MoScNet ( extbf{Mo}di extbf{Sc}ript extbf{Net}work), a novel Vision-Language Model (VLM) framework for transliterating Modi script images into Devanagari text. MoScNet leverages Knowledge Distillation, where a student model learns from a teacher model to enhance transliteration performance. The final student model of MoScNet has better performance than the teacher model while having 163$ imes$ lower parameters. Our work is the first to perform direct transliteration from the handwritten Modi script to the Devanagari script. MoScNet also shows competitive results on the optical character recognition (OCR) task.
Problem

Research questions and friction points this paper is trying to address.

Transliterate Modi script to Devanagari script
Preserve medieval Indian historical and scientific knowledge
Address scarcity of experts for Modi script transliteration
Innovation

Methods, ideas, or system contributions that make the work stand out.

MoDeTrans dataset with Modi-Devanagari images
MoScNet VLM framework for script transliteration
Knowledge Distillation enhances transliteration efficiency
🔎 Similar Papers
No similar papers found.
H
Harshal Kausadikar
Independent Researcher
T
Tanvi Kale
Independent Researcher
O
Onkar Susladikar
Yellow.AI
Sparsh Mittal
Sparsh Mittal
Associate Professor at ECE and Mehta Family School of DS&AI, IIT Roorkee
Deep learningComputer visionMachine learningComputer architecturenon-volatile memory