Everyday Physics in Korean Contexts: A Culturally Grounded Physical Reasoning Benchmark

📅 2025-09-22

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Existing physical commonsense reasoning benchmarks predominantly reflect Western cultural contexts, overlooking how cultural differences influence physical problem-solving. To address this gap, we introduce EPiK—the first Korean-culture-specific physical commonsense reasoning benchmark—comprising 181 binary-choice questions spanning nine reasoning categories (e.g., kimchi fermentation) and 84 culturally grounded scenarios. EPiK is constructed via a two-stage, culture-context-driven generation pipeline followed by rigorous expert validation, ensuring both physical accuracy and cultural authenticity. Experimental results demonstrate that culturally adapted models significantly outperform general-purpose foundation models, exposing critical limitations of current large language models in culture-specific physical reasoning. EPiK thus fills a key void in non-Western physical commonsense evaluation and empirically validates the essential role of culturally aware benchmarks in enhancing language models’ real-world situational understanding.

Technology Category

Application Category

📝 Abstract

Existing physical commonsense reasoning benchmarks predominantly focus on Western contexts, overlooking cultural variations in physical problem-solving. To address this gap, we introduce EPiK (Everyday Physics in Korean Contexts), a novel benchmark comprising 181 binary-choice problems that test physical reasoning within Korean cultural contexts, ranging from kimchi (Korean food) to traditional fermentation. EPiK is constructed using a two-stage generation and verification pipeline to create culturally-authentic problems across 9 reasoning subtasks and 84 scenarios. Unlike approaches based on simple translation, our method generates problems organically from Korean contexts while upholding rigorous physical reasoning standards. Our evaluations show that Korean-specialized models consistently outperform general-purpose models of comparable size. This performance gap highlights the limitations of culturally-agnostic models and demonstrates the critical need for culturally-aware benchmarks to truly measure language understanding. Our EPiK is publicly available at https://huggingface.co/datasets/jjae/EPiK.

Problem

Research questions and friction points this paper is trying to address.

Addressing cultural bias in physical reasoning benchmarks favoring Western contexts

Creating culturally-authentic Korean physical reasoning problems across diverse scenarios

Evaluating limitations of culturally-agnostic models in physical commonsense reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Culturally-grounded Korean physical reasoning benchmark

Two-stage generation and verification pipeline method

Organically generated problems from Korean contexts

🔎 Similar Papers

Evaluating Visual and Cultural Interpretation: The K-Viscuit Benchmark with Human-VLM Collaboration