🤖 AI Summary
This work proposes a novel approach to automatically induce unweighted finite-state transducers (FSTs) for string-to-string rewriting tasks by leveraging the geometric structure of hidden states in recurrent neural networks (RNNs). Traditional manual construction of FSTs is time-consuming and error-prone, while existing automated methods suffer from limited performance. The proposed method applies clustering and grammatical inference algorithms to RNN hidden states to derive compact and accurate FSTs without explicit weights. Evaluated on morphological generation, grapheme-to-phoneme conversion, and historical text normalization, the approach substantially outperforms classical algorithms, achieving up to an 87% relative improvement in FST accuracy while demonstrating superior robustness and precision.
📝 Abstract
Finite-State Transducers (FSTs) are effective models for string-to-string rewriting tasks, often providing the efficiency necessary for high-performance applications, but constructing transducers by hand is difficult. In this work, we propose a novel method for automatically constructing unweighted FSTs following the hidden state geometry learned by a recurrent neural network. We evaluate our methods on real-world datasets for morphological inflection, grapheme-to-phoneme prediction, and historical normalization, showing that the constructed FSTs are highly accurate and robust for many datasets, substantially outperforming classical transducer learning algorithms by up to 87% accuracy on held-out test sets.