Standard-to-Dialect Transfer Trends Differ across Text and Speech: A Case Study on Intent and Topic Classification in German Dialects

📅 2025-10-09

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This study investigates cross-dialectal transfer performance for intent and topic classification from Standard German to non-standard German dialects across three modalities: text, speech (end-to-end), and cascaded systems (ASR + text model). We introduce the first German dialectal speech intent classification benchmark dataset and conduct systematic multimodal evaluation of generalization under multi-dialect conditions. Results show that end-to-end speech models achieve the highest dialect robustness; text-based models excel on Standard German but degrade significantly on dialects; and cascaded systems—particularly when leveraging standardized ASR transcriptions—demonstrate strong dialect adaptability. Our core contributions are: (1) the first publicly available German dialectal speech intent classification benchmark, and (2) empirical evidence from controlled multimodal experiments revealing intrinsic links between modality characteristics and dialect robustness. These findings provide methodological insights and empirical foundations for low-resource dialectal NLP.

Technology Category

Application Category

📝 Abstract

Research on cross-dialectal transfer from a standard to a non-standard dialect variety has typically focused on text data. However, dialects are primarily spoken, and non-standard spellings are known to cause issues in text processing. We compare standard-to-dialect transfer in three settings: text models, speech models, and cascaded systems where speech first gets automatically transcribed and then further processed by a text model. In our experiments, we focus on German and multiple German dialects in the context of written and spoken intent and topic classification. To that end, we release the first dialectal audio intent classification dataset. We find that the speech-only setup provides the best results on the dialect data while the text-only setup works best on the standard data. While the cascaded systems lag behind the text-only models for German, they perform relatively well on the dialectal data if the transcription system generates normalized, standard-like output.

Problem

Research questions and friction points this paper is trying to address.

Comparing standard-to-dialect transfer across text and speech

Analyzing intent and topic classification in German dialects

Evaluating speech and text models on dialectal data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Speech-only models outperform text models on dialect data

Cascaded systems use automatic transcription before text processing

Normalized transcription output improves dialect classification performance

🔎 Similar Papers

No similar papers found.