ALBA: A European Portuguese Benchmark for Evaluating Language and Linguistic Dimensions in Generative LLMs

📅 2026-03-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of fine-grained evaluation benchmarks for European Portuguese (pt-PT) by introducing ALBA, the first comprehensive assessment framework tailored to this linguistic variety. ALBA encompasses eight linguistically grounded dimensions—including language variation, cultural semantics, and discourse analysis—and features a high-quality test set manually curated by native-speaking linguists. The benchmark integrates both expert annotations and an LLM-as-a-judge automated scoring mechanism to ensure robust and scalable evaluation. Experiments across multiple mainstream large language models demonstrate that ALBA effectively uncovers performance disparities across distinct pt-PT linguistic capabilities, thereby providing a necessary and reliable tool to advance language technology development for this underrepresented variant.
📝 Abstract
As Large Language Models (LLMs) expand across multilingual domains, evaluating their performance in under-represented languages becomes increasingly important. European Portuguese (pt-PT) is particularly affected, as existing training data and benchmarks are mainly in Brazilian Portuguese (pt-BR). To address this, we introduce ALBA, a linguistically grounded benchmark designed from the ground up to assess LLM proficiency in linguistic-related tasks in pt-PT across eight linguistic dimensions, including Language Variety, Culture-bound Semantics, Discourse Analysis, Word Plays, Syntax, Morphology, Lexicology, and Phonetics and Phonology. ALBA is manually constructed by language experts and paired with an LLM-as-a-judge framework for scalable evaluation of pt-PT generated language. Experiments on a diverse set of models reveal performance variability across linguistic dimensions, highlighting the need for comprehensive, variety-sensitive benchmarks that support further development of tools in pt-PT.
Problem

Research questions and friction points this paper is trying to address.

European Portuguese
language benchmark
linguistic evaluation
under-represented languages
language variety
Innovation

Methods, ideas, or system contributions that make the work stand out.

European Portuguese
linguistic benchmark
LLM evaluation
language variety
LLM-as-a-judge
🔎 Similar Papers
No similar papers found.
I
Inês Vieira
NOVA University of Lisbon, Portugal
I
Inês Calvo
NOVA University of Lisbon, Portugal
I
Iago Paulo
NOVA University of Lisbon, Portugal; NOVA LINCS
J
James Furtado
NOVA University of Lisbon, Portugal; NOVA LINCS
Rafael Ferreira
Rafael Ferreira
PhD Student, Nova School of Science and Technology
Conversational AgentsMachine LearningArtificial Intelligence
Diogo Tavares
Diogo Tavares
NOVA School of Science and Technology
Diogo Glória-Silva
Diogo Glória-Silva
4th Year PhD School of Science and Technology, NOVA University,
procedural plan guidancevision and language models
David Semedo
David Semedo
Universidade NOVA de Lisboa
Vision and LanguageDeep Learning for MultimediaConversational AI
J
João Magalhães
NOVA University of Lisbon, Portugal