Advancing Singlish Understanding: Bridging the Gap with Datasets and Multimodal Models

πŸ“… 2025-01-02
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the scarcity of high-quality resources and unified modeling frameworks for Singaporean English (Singlish) spoken language understanding. We introduce MNSC, the first large-scale, standardized, multi-task speech corpus for Singlish, covering automatic speech recognition (ASR), spoken question answering, dialogue summarization, and paralinguistic understanding. We further propose SingAudioLLM, the first end-to-end multi-task multimodal model for Singlish, integrating speech-text alignment pretraining, multi-task prompt learning, and human-verified, standardized data splits. Compared to conventional cascaded systems and existing AudioLLMs, SingAudioLLM achieves 10–30% performance gains across multiple Singlish spoken language understanding tasks, establishing new state-of-the-art results. Our core contributions are: (1) the first open-source, high-quality Singlish speech benchmark; (2) the first multi-task joint audio-language modeling paradigm; and (3) systematic empirical validation of multimodal large language models’ efficacy in low-resource dialectal spoken language understanding.

Technology Category

Application Category

πŸ“ Abstract
Singlish, a Creole language rooted in English, is a key focus in linguistic research within multilingual and multicultural contexts. However, its spoken form remains underexplored, limiting insights into its linguistic structure and applications. To address this gap, we standardize and annotate the largest spoken Singlish corpus, introducing the Multitask National Speech Corpus (MNSC). These datasets support diverse tasks, including Automatic Speech Recognition (ASR), Spoken Question Answering (SQA), Spoken Dialogue Summarization (SDS), and Paralinguistic Question Answering (PQA). We release standardized splits and a human-verified test set to facilitate further research. Additionally, we propose SingAudioLLM, a multi-task multimodal model leveraging multimodal large language models to handle these tasks concurrently. Experiments reveal our models adaptability to Singlish context, achieving state-of-the-art performance and outperforming prior models by 10-30% in comparison with other AudioLLMs and cascaded solutions.
Problem

Research questions and friction points this paper is trying to address.

Singlish Recognition
Multilingual Learning
Automatic Speech Recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Singlish Oral Database
Multitask Model SingAudioLLM
Performance Improvement
πŸ”Ž Similar Papers
No similar papers found.
B
Bin Wang
Institute for Infocomm Research (I2R), A*STAR, Singapore
X
Xunlong Zou
Institute for Infocomm Research (I2R), A*STAR, Singapore
Shuo Sun
Shuo Sun
Johns Hopkins University
W
Wenyu Zhang
Institute for Infocomm Research (I2R), A*STAR, Singapore
Y
Yingxu He
Institute for Infocomm Research (I2R), A*STAR, Singapore
Zhuohan Liu
Zhuohan Liu
Research Engineer
Chengwei Wei
Chengwei Wei
Research Scientist, Institute for Infocomm Research, A*STAR
Natural Language Processing
Nancy F. Chen
Nancy F. Chen
ISCA Fellow, AAIA Fellow, Multimodal Generative AI Group Leader, AI for Education Head at A*STAR
Agentic AILarge Language ModelsConversational AI
AiTi Aw
AiTi Aw
Aw Ai Ti