La Leaderboard: A Large Language Model Leaderboard for Spanish Varieties and Languages of Spain and Latin America

📅 2025-07-01
📈 Citations: 0
✹ Influential: 0
📄 PDF
đŸ€– AI Summary
Existing LLM evaluation benchmarks critically overlook linguistic and cultural diversity within the Spanish-speaking community, particularly lacking open-source assessment frameworks covering multiple Spanish variants (e.g., Peninsular, Latin American) and co-official regional languages (e.g., Basque, Catalan, Galician). Method: We introduce the first open-source, multivariate Spanish-language LLM benchmark, integrating 66 diverse datasets and systematically evaluating 50 generative models. Our approach employs a standardized, few-shot, low-compute evaluation framework to ensure reproducibility and scalability, while prioritizing community-driven curation and continuous updates. Contribution/Results: We publicly release a fully accessible, regularly updated leaderboard—the first of its kind—filling a critical gap in Spanish-language AI evaluation. This benchmark establishes an authoritative, inclusive, and extensible standard for developing and assessing regionally grounded language models, supporting equitable advancement of multilingual and multicultural AI across the Spanish-speaking world.

Technology Category

Application Category

📝 Abstract
Leaderboards showcase the current capabilities and limitations of Large Language Models (LLMs). To motivate the development of LLMs that represent the linguistic and cultural diversity of the Spanish-speaking community, we present La Leaderboard, the first open-source leaderboard to evaluate generative LLMs in languages and language varieties of Spain and Latin America. La Leaderboard is a community-driven project that aims to establish an evaluation standard for everyone interested in developing LLMs for the Spanish-speaking community. This initial version combines 66 datasets in Basque, Catalan, Galician, and different Spanish varieties, showcasing the evaluation results of 50 models. To encourage community-driven development of leaderboards in other languages, we explain our methodology, including guidance on selecting the most suitable evaluation setup for each downstream task. In particular, we provide a rationale for using fewer few-shot examples than typically found in the literature, aiming to reduce environmental impact and facilitate access to reproducible results for a broader research community.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs for Spanish varieties and languages in Spain and Latin America
Establishing a community-driven evaluation standard for Spanish-speaking LLMs
Reducing environmental impact by optimizing few-shot evaluation setups
Innovation

Methods, ideas, or system contributions that make the work stand out.

First open-source leaderboard for Spanish varieties
Combines 66 datasets in multiple languages
Uses fewer few-shot examples for efficiency
🔎 Similar Papers
No similar papers found.
MarĂ­a Grandury
MarĂ­a Grandury
SomosNLP / Polytechnical University of Madrid
Natural Language ProcessingLLM Evaluation
J
Javier Aula-Blasco
Barcelona Supercomputing Center
JĂșlia FalcĂŁo
JĂșlia FalcĂŁo
Barcelona Supercomputing Center (BSC)
NLPAI ethicsbiasLLM evaluation
Clémentine Fourrier
Clémentine Fourrier
HuggingFace
Miguel GonzĂĄlez
Miguel GonzĂĄlez
ETSIT, Universidad Politécnica de Madrid
Gonzalo MartĂ­nez
Gonzalo MartĂ­nez
Universidad Carlos III de Madrid
G
Gonzalo SantamarĂ­a
Instituto de IngenierĂ­a del Conocimiento
Rodrigo Agerri
Rodrigo Agerri
HiTZ Center - Ixa, University of the Basque Country UPV/EHU
Natural Language Processing
N
Nuria Aldama
Instituto de IngenierĂ­a del Conocimiento
Luis Chiruzzo
Luis Chiruzzo
Universidad de la RepĂșblica
natural language processingartificial intelligence
J
Javier Conde
ETSIT, Universidad Politécnica de Madrid
H
Helena GĂłmez
Universidad Nacional Autónoma de México
M
Marta Guerrero
Instituto de IngenierĂ­a del Conocimiento
Guido Ivetta
Guido Ivetta
Universidad Nacional de CĂłrdoba, Argentina / FundaciĂłn VĂ­a Libre
CalibrationBias in LLMs
Natalia LĂłpez
Natalia LĂłpez
Universidad Complutense de Madrid
Flor Miriam Plaza-del-Arco
Flor Miriam Plaza-del-Arco
Assistant Professor, Leiden University
Natural Language ProcessingComputational Social ScienceOnline harmsAffective ComputingEthics
M
MarĂ­a Teresa MartĂ­n-Valdivia
Universidad de Jaén
H
Helena Montoro
Instituto de IngenierĂ­a del Conocimiento
C
Carmen Muñoz
Instituto de IngenierĂ­a del Conocimiento
P
Pedro Reviriego
ETSIT, Universidad Politécnica de Madrid
L
Leire Rosado
Instituto de IngenierĂ­a del Conocimiento
A
Alejandro Vaca
LenguajeNatural.AI
M
MarĂ­a Estrella Vallecillo-RodrĂ­guez
Universidad de Jaén
Jorge Vallego
Jorge Vallego
Independent Researcher
Artificial IntelligenceEcolinguistics
Irune Zubiaga
Irune Zubiaga
HiTZ Zentroa
Natural Language Processing