Physical Commonsense Reasoning for Lower-Resourced Languages and Dialects: a Study on Basque

📅 2026-02-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the underexplored capacity of low-resource languages—such as Basque—for physical commonsense reasoning in non-question-answering formats. We present BasPhyCo, the first non-QA physical commonsense reasoning dataset for Basque and its dialectal variants, accompanied by a three-tier evaluation framework assessing accuracy, consistency, and verifiability. Using this benchmark, we evaluate both multilingual large language models and models specifically pretrained for Italian and Basque. Our experiments reveal that current models exhibit substantially limited physical commonsense reasoning capabilities in Basque, particularly in its dialectal forms, with notably weak performance along the verifiability dimension. This work establishes a new benchmark and evaluation paradigm to advance commonsense reasoning research for low-resource languages.

Technology Category

Application Category

📝 Abstract
Physical commonsense reasoning represents a fundamental capability of human intelligence, enabling individuals to understand their environment, predict future events, and navigate physical spaces. Recent years have witnessed growing interest in reasoning tasks within Natural Language Processing (NLP). However, no prior research has examined the performance of Large Language Models (LLMs) on non-question-answering (non-QA) physical commonsense reasoning tasks in low-resource languages such as Basque. Taking the Italian GITA as a starting point, this paper addresses this gap by presenting BasPhyCo, the first non-QA physical commonsense reasoning dataset for Basque, available in both standard and dialectal variants. We evaluate model performance across three hierarchical levels of commonsense understanding: (1) distinguishing between plausible and implausible narratives (accuracy), (2) identifying the conflicting element that renders a narrative implausible (consistency), and (3) determining the specific physical state that creates the implausibility (verifiability). These tasks were assessed using multiple multilingual LLMs as well as models pretrained specifically for Italian and Basque. Results indicate that, in terms of verifiability, LLMs exhibit limited physical commonsense capabilities in low-resource languages such as Basque, especially when processing dialectal variants.
Problem

Research questions and friction points this paper is trying to address.

physical commonsense reasoning
low-resource languages
Basque
dialectal variants
non-QA tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

physical commonsense reasoning
low-resource languages
Basque
non-QA dataset
dialectal variants
🔎 Similar Papers
No similar papers found.