🤖 AI Summary
Low-resource Indian-Aryan languages (e.g., Hindi, Sinhala, Malayalam) suffer from poor accuracy and high latency in real-time romanization-to-native-script transliteration. Method: We propose the first multilingual real-time inverse transliteration system, integrating sequence-to-sequence neural modeling, rule-guided phoneme-grapheme mapping constraints, language-specific orthogonality modeling, and end-to-end low-latency optimization. Results: Evaluated on a three-language benchmark, all four submitted systems significantly outperform baselines—achieving an average BLEU gain of ≥12.3—demonstrating the efficacy of cross-lingual shared modeling for low-resource real-time transliteration. This work establishes the first unified real-time inverse transliteration framework for multiple Indian-Aryan languages, enabling standardized, usable native-script input in keyboard-based interfaces for low-resource languages.
📝 Abstract
The paper overviews the shared task on Real-Time Reverse Transliteration for Romanized Indo-Aryan languages. It focuses on the reverse transliteration of low-resourced languages in the Indo-Aryan family to their native scripts. Typing Romanized Indo-Aryan languages using ad-hoc transliterals and achieving accurate native scripts are complex and often inaccurate processes with the current keyboard systems. This task aims to introduce and evaluate a real-time reverse transliterator that converts Romanized Indo-Aryan languages to their native scripts, improving the typing experience for users. Out of 11 registered teams, four teams participated in the final evaluation phase with transliteration models for Sinhala, Hindi and Malayalam. These proposed solutions not only solve the issue of ad-hoc transliteration but also empower low-resource language usability in the digital arena.