Sinhala Physical Common Sense Reasoning Dataset for Global PIQA

📅 2026-02-02

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

This work addresses the absence of physical commonsense reasoning datasets for Sinhala, a low-resource language, which has hindered localized commonsense research. To bridge this gap, the authors present the first Sinhala dataset aligned with the PIQA framework, explicitly contextualized within Sri Lankan scenarios. The dataset comprises 110 high-quality instances, each consisting of a prompt paired with one correct and one plausible but incorrect answer. Data quality is ensured through manual curation and a dual-answer validation protocol. By introducing this resource, the study fills a critical void in Sinhala-language commonsense reasoning and provides a valuable benchmark for training and evaluating multilingual commonsense models in underrepresented linguistic contexts.

Technology Category

Application Category

📝 Abstract

This paper presents the first-ever Sinhala physical common sense reasoning dataset created as part of Global PIQA. It contains 110 human-created and verified data samples, where each sample consists of a prompt, the corresponding correct answer, and a wrong answer. Most of the questions refer to the Sri Lankan context, where Sinhala is an official language.

Problem

Research questions and friction points this paper is trying to address.

Sinhala

physical common sense reasoning

dataset

Global PIQA

Sri Lankan context

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sinhala

physical common sense reasoning

Global PIQA