skLEP: A Slovak General Language Understanding Benchmark

📅 2025-06-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Slovak lacks a comprehensive, standardized natural language understanding (NLU) evaluation benchmark, hindering systematic assessment and fair comparison of language models. Method: We introduce skLEP—the first dedicated Slovak NLU benchmark—comprising nine tasks spanning lexical, sentence-pair, and document-level granularities. skLEP integrates original Slovak data with high-fidelity English-to-Slovak translations, rigorously validated through expert annotation and quality control. Contribution/Results: skLEP establishes the first standardized, open-source evaluation framework for Slovak, including the full dataset, unified training/evaluation toolkit, and a live public leaderboard. Leveraging skLEP, we conduct the most extensive empirical evaluation to date of monolingual, multilingual, and English pretrained language models on Slovak NLU tasks. This work fills a critical gap in Slovak NLU infrastructure, substantially enhancing reproducibility, comparability, and progress in low-resource language modeling research.

Technology Category

Application Category

📝 Abstract
In this work, we introduce skLEP, the first comprehensive benchmark specifically designed for evaluating Slovak natural language understanding (NLU) models. We have compiled skLEP to encompass nine diverse tasks that span token-level, sentence-pair, and document-level challenges, thereby offering a thorough assessment of model capabilities. To create this benchmark, we curated new, original datasets tailored for Slovak and meticulously translated established English NLU resources. Within this paper, we also present the first systematic and extensive evaluation of a wide array of Slovak-specific, multilingual, and English pre-trained language models using the skLEP tasks. Finally, we also release the complete benchmark data, an open-source toolkit facilitating both fine-tuning and evaluation of models, and a public leaderboard at https://github.com/slovak-nlp/sklep in the hopes of fostering reproducibility and drive future research in Slovak NLU.
Problem

Research questions and friction points this paper is trying to address.

Introducing first Slovak NLU benchmark skLEP
Evaluating diverse Slovak and multilingual models
Providing open toolkit for Slovak NLU research
Innovation

Methods, ideas, or system contributions that make the work stand out.

First Slovak NLU benchmark with diverse tasks
Curated and translated datasets for Slovak
Open-source toolkit and public leaderboard provided
🔎 Similar Papers
No similar papers found.
Marek Šuppa
Marek Šuppa
Comenius University in Bratislava
Natural Language ProcessingComputer VisionMachine Learning
A
Andrej Ridzik
Kempelen Institute of Intelligent Technologies, Bratislava, Slovakia
D
Daniel Hládek
Technical University of Košice, Slovakia
T
Tomáš Javůrek
Kempelen Institute of Intelligent Technologies, Bratislava, Slovakia
V
Viktória Ondrejová
Comenius University in Bratislava, Slovakia; Cisco Systems
K
Kristína Sásiková
Comenius University in Bratislava, Slovakia
M
Martin Tamajka
Kempelen Institute of Intelligent Technologies, Bratislava, Slovakia
M
Marián Šimko
Kempelen Institute of Intelligent Technologies, Bratislava, Slovakia