From Human Judgements to Predictive Models: Unravelling Acceptability in Code-Mixed Sentences

📅 2024-05-09

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This work addresses the lack of naturalness modeling for code-mixed text by proposing a human acceptability–driven quality modeling and controllable generation framework. We introduce Cline, the first large-scale English–Indic bilingual code-mixing acceptability dataset with human annotations (16,642 sentences), and empirically demonstrate that conventional metrics—such as Code-Mixing Index (CMI) and switch-point count—exhibit weak correlation with human judgments. Fine-tuning multilingual models (XLM-RoBERTa and BERNICE) significantly outperforms IndicBERT and MLP baselines; notably, our approach achieves strong zero-shot cross-lingual transfer (e.g., en-hi → en-te). It also substantially surpasses ChatGPT under zero- and few-shot prompting settings. All data, models, and code are publicly released to foster reproducible research.

Technology Category

Application Category

📝 Abstract

Current computational approaches for analysing or generating code-mixed sentences do not explicitly model"naturalness"or"acceptability"of code-mixed sentences, but rely on training corpora to reflect distribution of acceptable code-mixed sentences. Modelling human judgement for the acceptability of code-mixed text can help in distinguishing natural code-mixed text and enable quality-controlled generation of code-mixed text. To this end, we construct Cline - a dataset containing human acceptability judgements for English-Hindi (en-hi) code-mixed text. Cline is the largest of its kind with 16,642 sentences, consisting of samples sourced from two sources: synthetically generated code-mixed text and samples collected from online social media. Our analysis establishes that popular code-mixing metrics such as CMI, Number of Switch Points, Burstines, which are used to filter/curate/compare code-mixed corpora have low correlation with human acceptability judgements, underlining the necessity of our dataset. Experiments using Cline demonstrate that simple Multilayer Perceptron (MLP) models trained solely on code-mixing metrics are outperformed by fine-tuned pre-trained Multilingual Large Language Models (MLLMs). Specifically, XLM-Roberta and Bernice outperform IndicBERT across different configurations in challenging data settings. Comparison with ChatGPT's zero and fewshot capabilities shows that MLLMs fine-tuned on larger data outperform ChatGPT, providing scope for improvement in code-mixed tasks. Zero-shot transfer from English-Hindi to English-Telugu acceptability judgments using our model checkpoints proves superior to random baselines, enabling application to other code-mixed language pairs and providing further avenues of research. We publicly release our human-annotated dataset, trained checkpoints, code-mix corpus, and code for data generation and model training.

Problem

Research questions and friction points this paper is trying to address.

Modeling human judgement for code-mixed text acceptability

Evaluating correlation between code-mixing metrics and human judgements

Comparing performance of ML models on code-mixed acceptability tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Construct Cline dataset for human acceptability judgements

Use MLLMs to outperform traditional MLP models

Decoder-only models excel in code-mixed tasks

🔎 Similar Papers

No similar papers found.