ALBA: A European Portuguese Benchmark for Evaluating Language and Linguistic Dimensions in Generative LLMs

📅 2026-03-27

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

This work addresses the lack of fine-grained evaluation benchmarks for European Portuguese (pt-PT) by introducing ALBA, the first comprehensive assessment framework tailored to this linguistic variety. ALBA encompasses eight linguistically grounded dimensions—including language variation, cultural semantics, and discourse analysis—and features a high-quality test set manually curated by native-speaking linguists. The benchmark integrates both expert annotations and an LLM-as-a-judge automated scoring mechanism to ensure robust and scalable evaluation. Experiments across multiple mainstream large language models demonstrate that ALBA effectively uncovers performance disparities across distinct pt-PT linguistic capabilities, thereby providing a necessary and reliable tool to advance language technology development for this underrepresented variant.

Technology Category

Application Category

📝 Abstract

As Large Language Models (LLMs) expand across multilingual domains, evaluating their performance in under-represented languages becomes increasingly important. European Portuguese (pt-PT) is particularly affected, as existing training data and benchmarks are mainly in Brazilian Portuguese (pt-BR). To address this, we introduce ALBA, a linguistically grounded benchmark designed from the ground up to assess LLM proficiency in linguistic-related tasks in pt-PT across eight linguistic dimensions, including Language Variety, Culture-bound Semantics, Discourse Analysis, Word Plays, Syntax, Morphology, Lexicology, and Phonetics and Phonology. ALBA is manually constructed by language experts and paired with an LLM-as-a-judge framework for scalable evaluation of pt-PT generated language. Experiments on a diverse set of models reveal performance variability across linguistic dimensions, highlighting the need for comprehensive, variety-sensitive benchmarks that support further development of tools in pt-PT.

Problem

Research questions and friction points this paper is trying to address.

European Portuguese

language benchmark

linguistic evaluation

under-represented languages

language variety

Innovation

Methods, ideas, or system contributions that make the work stand out.

European Portuguese

linguistic benchmark

LLM evaluation