🤖 AI Summary
This study addresses the challenge of dialect identification in low-resource settings—exemplified by Swiss German—where annotated speech data are scarce. It proposes a novel approach that leverages large language models (LLMs) as intelligent agents, integrating automatic speech recognition (ASR)-generated phonetic transcriptions with structured linguistic knowledge, such as dialect feature maps and vowel shift rules. The work establishes dual baselines involving both LLMs and human linguists, demonstrating that incorporating explicit linguistic information substantially enhances the LLM’s dialect classification performance. Human expert evaluation confirms the validity of the ASR-derived transcriptions and highlights their potential for further refinement. This research underscores the innovative synergy between computational models and theoretical linguistics, offering a promising pathway for dialect recognition in data-scarce scenarios.
📝 Abstract
Due to the scarcity of labeled dialectal speech, audio dialect classification is a challenging task for most languages, including Swiss German. In this work, we explore the ability of large language models (LLMs) as agents in understanding the dialects and whether they can show comparable performance to models such as HuBERT in dialect classification. In addition, we provide an LLM baseline and a human linguist one. Our approach uses phonetic transcriptions produced by ASR systems and combines them with linguistic resources such as dialect feature maps, vowel history, and rules. Our findings indicate that, when linguistic information is provided, the LLM predictions improve. The human baseline shows that automatically generated transcriptions can be beneficial for such classifications, but also presents opportunities for improvement.