LLMs vs. Traditional Sentiment Tools in Psychology: An Evaluation on Belgian-Dutch Narratives

📅 2025-11-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the efficacy of large language models (LLMs) versus traditional lexicon-based tools (LIWC, Pattern) in detecting sentiment valence within low-resource dialectal Dutch—specifically Belgian Dutch (Flemish)—using authentic, spontaneous narrative data. Methodologically, we benchmark three Dutch-specialized LLMs (ChocoLlama-8B-Instruct, Reynaerde-7B-chat, GEITje-7B-ultra) against lexicon-based approaches on a corpus of ~25,000 real-world user texts. Results reveal that Pattern significantly outperforms all LLMs in valence classification accuracy, challenging the prevailing assumption of LLMs’ universal superiority in sentiment analysis. This suggests systemic limitations in current LLM fine-tuning paradigms for modeling culturally embedded, informal affective expressions. The study contributes a reproducible, dialect-specific evaluation benchmark and advocates for linguistically and culturally grounded assessment frameworks for low-resource sentiment analysis, offering methodological caution against uncritical LLM adoption in sociolinguistically complex settings.

Technology Category

Application Category

📝 Abstract
Understanding emotional nuances in everyday language is crucial for computational linguistics and emotion research. While traditional lexicon-based tools like LIWC and Pattern have served as foundational instruments, Large Language Models (LLMs) promise enhanced context understanding. We evaluated three Dutch-specific LLMs (ChocoLlama-8B-Instruct, Reynaerde-7B-chat, and GEITje-7B-ultra) against LIWC and Pattern for valence prediction in Flemish, a low-resource language variant. Our dataset comprised approximately 25000 spontaneous textual responses from 102 Dutch-speaking participants, each providing narratives about their current experiences with self-assessed valence ratings (-50 to +50). Surprisingly, despite architectural advancements, the Dutch-tuned LLMs underperformed compared to traditional methods, with Pattern showing superior performance. These findings challenge assumptions about LLM superiority in sentiment analysis tasks and highlight the complexity of capturing emotional valence in spontaneous, real-world narratives. Our results underscore the need for developing culturally and linguistically tailored evaluation frameworks for low-resource language variants, while questioning whether current LLM fine-tuning approaches adequately address the nuanced emotional expressions found in everyday language use.
Problem

Research questions and friction points this paper is trying to address.

Evaluating Dutch LLMs versus traditional tools for sentiment analysis
Assessing emotional valence prediction in low-resource Flemish language
Comparing performance on spontaneous real-world narrative datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluated Dutch-tuned LLMs against traditional lexicon tools
Used Flemish spontaneous narratives with self-assessed valence ratings
Found traditional methods outperformed LLMs in valence prediction