Automatic Speech Recognition for Documenting Endangered Languages: Case Study of Ikema Miyakoan

📅 2026-03-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the urgent documentation needs of Ikema Miyakoan, an endangered language facing severe intergenerational transmission challenges, by developing the first end-to-end automatic speech recognition (ASR) system for it. Leveraging a newly compiled corpus of field-recorded speech amounting to several hours of audio, the research trains and optimizes ASR models under extremely low-resource conditions. Experimental results demonstrate a character error rate as low as 15%, substantially reducing both the time and cognitive load associated with manual transcription. The findings underscore the practical utility and scalability of ASR technology in documenting endangered languages and establish a viable paradigm for technical intervention in similarly resource-constrained linguistic contexts.
📝 Abstract
Language endangerment poses a major challenge to linguistic diversity worldwide, and technological advances have opened new avenues for documentation and revitalization. Among these, automatic speech recognition (ASR) has shown increasing potential to assist in the transcription of endangered language data. This study focuses on Ikema, a severely endangered Ryukyuan language spoken in Okinawa, Japan, with approximately 1,300 remaining speakers, most of whom are over 60 years old. We present an ongoing effort to develop an ASR system for Ikema based on field recordings. Specifically, we (1) construct a {\totaldatasethours}-hour speech corpus from field recordings, (2) train an ASR model that achieves a character error rate as low as 15\%, and (3) evaluate the impact of ASR assistance on the efficiency of speech transcription. Our results demonstrate that ASR integration can substantially reduce transcription time and cognitive load, offering a practical pathway toward scalable, technology-supported documentation of endangered languages.
Problem

Research questions and friction points this paper is trying to address.

endangered languages
automatic speech recognition
language documentation
Ikema Miyakoan
speech transcription
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatic Speech Recognition
Endangered Languages
Ikema Miyakoan
Speech Corpus
Transcription Efficiency
🔎 Similar Papers
No similar papers found.