Sinhala Physical Common Sense Reasoning Dataset for Global PIQA

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the absence of physical commonsense reasoning datasets for Sinhala, a low-resource language, which has hindered localized commonsense research. To bridge this gap, the authors present the first Sinhala dataset aligned with the PIQA framework, explicitly contextualized within Sri Lankan scenarios. The dataset comprises 110 high-quality instances, each consisting of a prompt paired with one correct and one plausible but incorrect answer. Data quality is ensured through manual curation and a dual-answer validation protocol. By introducing this resource, the study fills a critical void in Sinhala-language commonsense reasoning and provides a valuable benchmark for training and evaluating multilingual commonsense models in underrepresented linguistic contexts.

Technology Category

Application Category

📝 Abstract
This paper presents the first-ever Sinhala physical common sense reasoning dataset created as part of Global PIQA. It contains 110 human-created and verified data samples, where each sample consists of a prompt, the corresponding correct answer, and a wrong answer. Most of the questions refer to the Sri Lankan context, where Sinhala is an official language.
Problem

Research questions and friction points this paper is trying to address.

Sinhala
physical common sense reasoning
dataset
Global PIQA
Sri Lankan context
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sinhala
physical common sense reasoning
Global PIQA
low-resource language
human-verified dataset
🔎 Similar Papers
No similar papers found.