Small Wins Big: Comparing Large Language Models and Domain Fine-Tuned Models for Sarcasm Detection in Code-Mixed Hinglish Text

📅 2026-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenges of sarcasm detection in multilingual and code-mixed settings, where structural variation, informal expressions, and data scarcity for low-resource languages hinder performance. The authors compare large language models—including Llama 3.1, Mistral, Gemma 3, and Phi-4—against a lightweight DistilBERT model fine-tuned with domain-adaptive strategies on Hindi–English code-mixed text. By augmenting the training data with a small amount of code-mixed examples generated by large models, the fine-tuned DistilBERT achieves an accuracy of 84% under low-resource conditions, substantially outperforming all large models evaluated in zero-shot and few-shot configurations. These results demonstrate that domain-specific fine-tuning of compact models offers a more effective and efficient approach than relying on large language models when computational resources or annotated data are limited.

Technology Category

Application Category

📝 Abstract
Sarcasm detection in multilingual and code-mixed environments remains a challenging task for natural language processing models due to structural variations, informal expressions, and low-resource linguistic availability. This study compares four large language models, Llama 3.1, Mistral, Gemma 3, and Phi-4, with a fine-tuned DistilBERT model for sarcasm detection in code-mixed Hinglish text. The results indicate that the smaller, sequentially fine-tuned DistilBERT model achieved the highest overall accuracy of 84%, outperforming all of the LLMs in zero and few-shot set ups, using minimal LLM generated code-mixed data used for fine-tuning. These findings indicate that domain-adaptive fine-tuning of smaller transformer based models may significantly improve sarcasm detection over general LLM inference, in low-resource and data scarce settings.
Problem

Research questions and friction points this paper is trying to address.

sarcasm detection
code-mixed
Hinglish
low-resource
multilingual
Innovation

Methods, ideas, or system contributions that make the work stand out.

sarcasm detection
code-mixed text
domain fine-tuning
DistilBERT
low-resource NLP
🔎 Similar Papers
No similar papers found.