Advancing Singlish Understanding: Bridging the Gap with Datasets and Multimodal Models

πŸ“… 2025-01-02
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

192K/year
πŸ€– AI Summary
This study addresses the scarcity of high-quality resources and unified modeling frameworks for Singaporean English (Singlish) spoken language understanding. We introduce MNSC, the first large-scale, standardized, multi-task speech corpus for Singlish, covering automatic speech recognition (ASR), spoken question answering, dialogue summarization, and paralinguistic understanding. We further propose SingAudioLLM, the first end-to-end multi-task multimodal model for Singlish, integrating speech-text alignment pretraining, multi-task prompt learning, and human-verified, standardized data splits. Compared to conventional cascaded systems and existing AudioLLMs, SingAudioLLM achieves 10–30% performance gains across multiple Singlish spoken language understanding tasks, establishing new state-of-the-art results. Our core contributions are: (1) the first open-source, high-quality Singlish speech benchmark; (2) the first multi-task joint audio-language modeling paradigm; and (3) systematic empirical validation of multimodal large language models’ efficacy in low-resource dialectal spoken language understanding.

Technology Category

Application Category

πŸ“ Abstract
Singlish, a Creole language rooted in English, is a key focus in linguistic research within multilingual and multicultural contexts. However, its spoken form remains underexplored, limiting insights into its linguistic structure and applications. To address this gap, we standardize and annotate the largest spoken Singlish corpus, introducing the Multitask National Speech Corpus (MNSC). These datasets support diverse tasks, including Automatic Speech Recognition (ASR), Spoken Question Answering (SQA), Spoken Dialogue Summarization (SDS), and Paralinguistic Question Answering (PQA). We release standardized splits and a human-verified test set to facilitate further research. Additionally, we propose SingAudioLLM, a multi-task multimodal model leveraging multimodal large language models to handle these tasks concurrently. Experiments reveal our models adaptability to Singlish context, achieving state-of-the-art performance and outperforming prior models by 10-30% in comparison with other AudioLLMs and cascaded solutions.
Problem

Research questions and friction points this paper is trying to address.

Singlish Recognition
Multilingual Learning
Automatic Speech Recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Singlish Oral Database
Multitask Model SingAudioLLM
Performance Improvement
πŸ”Ž Similar Papers
No similar papers found.