Hebrew Diacritics Restoration using Visual Representation

📅 2025-10-30

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

This work addresses Hebrew niqqud restoration—the task of recovering diacritical marks in unvocalized Hebrew text. We propose an end-to-end method based on vision-language models (VLMs), treating Hebrew words as images and leveraging VLMs to implicitly learn the mapping between glyph-level visual patterns and phonological annotations, thereby circumventing explicit linguistic rules. Our approach integrates context-aware dynamic candidate set generation with zero-shot classification to select the most appropriate niqqud pattern at the word level. To our knowledge, this is the first study to formulate Hebrew niqqud restoration as a joint visual-semantic understanding task, embedding phonological information directly into the visual feature space via image-based word representation. Experiments demonstrate that, under the assumption of complete candidate sets, our method achieves significantly improved generalization and restoration accuracy over conventional sequence-labeling and language-model-based baselines.

Technology Category

Application Category

📝 Abstract

Diacritics restoration in Hebrew is a fundamental task for ensuring accurate word pronunciation and disambiguating textual meaning. Despite the language's high degree of ambiguity when unvocalized, recent machine learning approaches have significantly advanced performance on this task. In this work, we present DIVRIT, a novel system for Hebrew diacritization that frames the task as a zero-shot classification problem. Our approach operates at the word level, selecting the most appropriate diacritization pattern for each undiacritized word from a dynamically generated candidate set, conditioned on the surrounding textual context. A key innovation of DIVRIT is its use of a Hebrew Visual Language Model, which processes undiacritized text as an image, allowing diacritic information to be embedded directly within the input's vector representation. Through a comprehensive evaluation across various configurations, we demonstrate that the system effectively performs diacritization without relying on complex, explicit linguistic analysis. Notably, in an ``oracle'' setting where the correct diacritized form is guaranteed to be among the provided candidates, DIVRIT achieves a high level of accuracy. Furthermore, strategic architectural enhancements and optimized training methodologies yield significant improvements in the system's overall generalization capabilities. These findings highlight the promising potential of visual representations for accurate and automated Hebrew diacritization.

Problem

Research questions and friction points this paper is trying to address.

Restoring Hebrew diacritics for accurate pronunciation and meaning

Framing diacritization as zero-shot classification using visual models

Enhancing generalization without complex linguistic analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses visual language model for text processing

Frames diacritization as zero-shot classification problem

Selects diacritization patterns from dynamic candidate sets

🔎 Similar Papers

No similar papers found.