A Collection of Question Answering Datasets for Norwegian

📅 2025-01-19

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work addresses the absence of multidimensional evaluation benchmarks for Norwegian large language models (LLMs). We introduce the first high-quality, multicomponent question-answering benchmark covering both Bokmål and Nynorsk variants, integrating world knowledge, commonsense reasoning, factual verification, and Norway-specific knowledge—comprising over 10,000 expert-annotated samples by native speakers. Employing a human-in-the-loop annotation framework and zero-shot/few-shot prompting paradigms, we conduct cross-variant and cross-task consistency evaluations across 11 state-of-the-art LLMs. Results reveal substantial performance degradation on Nynorsk, weakest capabilities in commonsense reasoning, and systematically low answer veracity. This benchmark fills a critical gap in Nordic language evaluation infrastructure; all data, annotations, and protocols are publicly released to support robust assessment and advancement of low-resource language models.

Technology Category

Application Category

📝 Abstract

This paper introduces a new suite of question answering datasets for Norwegian; NorOpenBookQA, NorCommonSenseQA, NorTruthfulQA, and NRK-Quiz-QA. The data covers a wide range of skills and knowledge domains, including world knowledge, commonsense reasoning, truthfulness, and knowledge about Norway. Covering both of the written standards of Norwegian - Bokm{aa}l and Nynorsk - our datasets comprise over 10k question-answer pairs, created by native speakers. We detail our dataset creation approach and present the results of evaluating 11 language models (LMs) in zero- and few-shot regimes. Most LMs perform better in Bokm{aa}l than Nynorsk, struggle most with commonsense reasoning, and are often untruthful in generating answers to questions. All our datasets and annotation materials are publicly available.

Problem

Research questions and friction points this paper is trying to address.

Norwegian Language

Question Answering Dataset

World Knowledge

Innovation

Methods, ideas, or system contributions that make the work stand out.

Norwegian QA Datasets

Multidialectal Language Modeling

Zero-shot and Few-shot Learning

🔎 Similar Papers

No similar papers found.