SLURP-TN : Resource for Tunisian Dialect Spoken Language Understanding

📅 2026-03-23

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work addresses the scarcity of high-quality spoken language understanding resources for Tunisian Arabic, which has severely hindered its deployment in task-oriented dialogue systems. To bridge this gap, the authors introduce SLURP-TN, the first spoken language understanding dataset for Tunisian dialect, comprising approximately five hours of speech from 55 native speakers across six domains, totaling 4,165 annotated utterances. Data quality is ensured through a combination of human translation and native-speaker recordings. The study further develops baseline automatic speech recognition (ASR) and spoken language understanding (SLU) systems leveraging deep neural networks and pretrained language models. Both the dataset and associated models are publicly released, filling a critical void in structured SLU benchmarks for low-resource dialects and significantly advancing research and applications of Tunisian Arabic in spoken interactive systems.

Technology Category

Application Category

📝 Abstract

Spoken Language Understanding (SLU) aims to extract the semantic information from the speech utterance of user queries. It is a core component in a task-oriented dialogue system. With the spectacular progress of deep neural network models and the evolution of pre-trained language models, SLU has obtained significant breakthroughs. However, only a few high-resource languages have taken advantage of this progress due to the absence of SLU resources. In this paper, we seek to mitigate this obstacle by introducing SLURP-TN. This dataset was created by recording 55 native speakers uttering sentences in Tunisian dialect, manually translated from six SLURP domains. The result is an SLU Tunisian dialect dataset that comprises 4165 sentences recorded into around 5 hours of acoustic material. We also develop a number of Automatic Speech Recognition and SLU models exploiting SLUTP-TN. The Dataset and baseline models are available at: https://huggingface.co/datasets/Elyadata/SLURP-TN.

Problem

Research questions and friction points this paper is trying to address.

Spoken Language Understanding

Tunisian dialect

low-resource languages

SLU dataset

task-oriented dialogue system

Innovation

Methods, ideas, or system contributions that make the work stand out.

Spoken Language Understanding

Tunisian Dialect

Low-resource Languages