A Collection of Question Answering Datasets for Norwegian

📅 2025-01-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the absence of multidimensional evaluation benchmarks for Norwegian large language models (LLMs). We introduce the first high-quality, multicomponent question-answering benchmark covering both Bokmål and Nynorsk variants, integrating world knowledge, commonsense reasoning, factual verification, and Norway-specific knowledge—comprising over 10,000 expert-annotated samples by native speakers. Employing a human-in-the-loop annotation framework and zero-shot/few-shot prompting paradigms, we conduct cross-variant and cross-task consistency evaluations across 11 state-of-the-art LLMs. Results reveal substantial performance degradation on Nynorsk, weakest capabilities in commonsense reasoning, and systematically low answer veracity. This benchmark fills a critical gap in Nordic language evaluation infrastructure; all data, annotations, and protocols are publicly released to support robust assessment and advancement of low-resource language models.

Technology Category

Application Category

📝 Abstract
This paper introduces a new suite of question answering datasets for Norwegian; NorOpenBookQA, NorCommonSenseQA, NorTruthfulQA, and NRK-Quiz-QA. The data covers a wide range of skills and knowledge domains, including world knowledge, commonsense reasoning, truthfulness, and knowledge about Norway. Covering both of the written standards of Norwegian - Bokm{aa}l and Nynorsk - our datasets comprise over 10k question-answer pairs, created by native speakers. We detail our dataset creation approach and present the results of evaluating 11 language models (LMs) in zero- and few-shot regimes. Most LMs perform better in Bokm{aa}l than Nynorsk, struggle most with commonsense reasoning, and are often untruthful in generating answers to questions. All our datasets and annotation materials are publicly available.
Problem

Research questions and friction points this paper is trying to address.

Norwegian Language
Question Answering Dataset
World Knowledge
Innovation

Methods, ideas, or system contributions that make the work stand out.

Norwegian QA Datasets
Multidialectal Language Modeling
Zero-shot and Few-shot Learning
🔎 Similar Papers
No similar papers found.