🤖 AI Summary
This study investigates the efficacy of large language models (LLMs) versus traditional lexicon-based tools (LIWC, Pattern) in detecting sentiment valence within low-resource dialectal Dutch—specifically Belgian Dutch (Flemish)—using authentic, spontaneous narrative data. Methodologically, we benchmark three Dutch-specialized LLMs (ChocoLlama-8B-Instruct, Reynaerde-7B-chat, GEITje-7B-ultra) against lexicon-based approaches on a corpus of ~25,000 real-world user texts. Results reveal that Pattern significantly outperforms all LLMs in valence classification accuracy, challenging the prevailing assumption of LLMs’ universal superiority in sentiment analysis. This suggests systemic limitations in current LLM fine-tuning paradigms for modeling culturally embedded, informal affective expressions. The study contributes a reproducible, dialect-specific evaluation benchmark and advocates for linguistically and culturally grounded assessment frameworks for low-resource sentiment analysis, offering methodological caution against uncritical LLM adoption in sociolinguistically complex settings.
📝 Abstract
Understanding emotional nuances in everyday language is crucial for computational linguistics and emotion research. While traditional lexicon-based tools like LIWC and Pattern have served as foundational instruments, Large Language Models (LLMs) promise enhanced context understanding. We evaluated three Dutch-specific LLMs (ChocoLlama-8B-Instruct, Reynaerde-7B-chat, and GEITje-7B-ultra) against LIWC and Pattern for valence prediction in Flemish, a low-resource language variant. Our dataset comprised approximately 25000 spontaneous textual responses from 102 Dutch-speaking participants, each providing narratives about their current experiences with self-assessed valence ratings (-50 to +50). Surprisingly, despite architectural advancements, the Dutch-tuned LLMs underperformed compared to traditional methods, with Pattern showing superior performance. These findings challenge assumptions about LLM superiority in sentiment analysis tasks and highlight the complexity of capturing emotional valence in spontaneous, real-world narratives. Our results underscore the need for developing culturally and linguistically tailored evaluation frameworks for low-resource language variants, while questioning whether current LLM fine-tuning approaches adequately address the nuanced emotional expressions found in everyday language use.