La Leaderboard: A Large Language Model Leaderboard for Spanish Varieties and Languages of Spain and Latin America

📅 2025-07-01

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Existing LLM evaluation benchmarks critically overlook linguistic and cultural diversity within the Spanish-speaking community, particularly lacking open-source assessment frameworks covering multiple Spanish variants (e.g., Peninsular, Latin American) and co-official regional languages (e.g., Basque, Catalan, Galician). Method: We introduce the first open-source, multivariate Spanish-language LLM benchmark, integrating 66 diverse datasets and systematically evaluating 50 generative models. Our approach employs a standardized, few-shot, low-compute evaluation framework to ensure reproducibility and scalability, while prioritizing community-driven curation and continuous updates. Contribution/Results: We publicly release a fully accessible, regularly updated leaderboard—the first of its kind—filling a critical gap in Spanish-language AI evaluation. This benchmark establishes an authoritative, inclusive, and extensible standard for developing and assessing regionally grounded language models, supporting equitable advancement of multilingual and multicultural AI across the Spanish-speaking world.

Technology Category

Application Category

📝 Abstract

Leaderboards showcase the current capabilities and limitations of Large Language Models (LLMs). To motivate the development of LLMs that represent the linguistic and cultural diversity of the Spanish-speaking community, we present La Leaderboard, the first open-source leaderboard to evaluate generative LLMs in languages and language varieties of Spain and Latin America. La Leaderboard is a community-driven project that aims to establish an evaluation standard for everyone interested in developing LLMs for the Spanish-speaking community. This initial version combines 66 datasets in Basque, Catalan, Galician, and different Spanish varieties, showcasing the evaluation results of 50 models. To encourage community-driven development of leaderboards in other languages, we explain our methodology, including guidance on selecting the most suitable evaluation setup for each downstream task. In particular, we provide a rationale for using fewer few-shot examples than typically found in the literature, aiming to reduce environmental impact and facilitate access to reproducible results for a broader research community.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs for Spanish varieties and languages in Spain and Latin America

Establishing a community-driven evaluation standard for Spanish-speaking LLMs

Reducing environmental impact by optimizing few-shot evaluation setups

Innovation

Methods, ideas, or system contributions that make the work stand out.

First open-source leaderboard for Spanish varieties

Combines 66 datasets in multiple languages

Uses fewer few-shot examples for efficiency

🔎 Similar Papers

Evaluating Contextualized Representations of (Spanish) Ambiguous Words: A New Lexical Resource and Empirical Analysis