SemEval-2026 Task 7: Everyday Knowledge Across Diverse Languages and Cultures

📅 2026-05-04
📈 Citations: 0
Influential: 0
📄 PDF

career value

172K/year
🤖 AI Summary
This study systematically evaluates the generalization of commonsense knowledge in large language models across multilingual and multicultural contexts, with a particular focus on low-resource languages and underrepresented cultures. Building upon a human-curated extension of the BLEnD benchmark encompassing over 30 language–culture pairs, the evaluation features two tracks—short-answer and multiple-choice—and strictly enforces a zero-shot setting, prohibiting any training or fine-tuning on the benchmark data while allowing participation from any NLP system. As the first large-scale, purely evaluative benchmark for cross-cultural commonsense reasoning, the initiative attracted registrations from over 140 teams, with 62 submitting results. Analysis reveals that state-of-the-art approaches perform substantially worse on low-resource languages, highlighting critical challenges in cultural alignment and cross-cultural commonsense transfer.
📝 Abstract
We present our shared task on evaluating the adaptability of LLMs and NLP systems across multiple languages and cultures. The task data consist of an extended version of our manually constructed BLEnD benchmark (Myung et al. 2024), covering more than 30 language-culture pairs, predominantly representing low-resource languages spoken across multiple continents. As the task is designed strictly for evaluation, participants were not permitted to use the data for training, fine-tuning, few-shot learning, or any other form of model modification. Our task includes two tracks: (a) Short-Answer Questions (SAQ) and (b) Multiple-Choice Questions (MCQ). Participants were required to predict labels and were allowed to submit any NLP system and adopt diverse modelling strategies, provided that the benchmark was used solely for evaluation. The task attracted more than 140 registered participants, and we received final submissions from 62 teams, along with 19 system description papers. We report the results and present an analysis of the best-performing systems and the most commonly adopted approaches. Furthermore, we discuss shared insights into open questions and challenges related to evaluation, misalignment, and methodological perspectives on model behaviour in low-resource languages and for under-represented cultures.
Problem

Research questions and friction points this paper is trying to address.

language adaptability
cultural diversity
low-resource languages
everyday knowledge
model evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-lingual evaluation
low-resource languages
cultural adaptation
LLM generalization
zero-shot benchmarking
Nedjma Ousidhoum
Nedjma Ousidhoum
Lecturer (Assistant Professor), Cardiff University
Natural Language ProcessingComputational Social ScienceMachine Learning
Junho Myung
Junho Myung
KAIST
NLPHCI
C
Carla Perez-Almendros
Cardiff University
Jiho Jin
Jiho Jin
KAIST
NLPMachine Learning
A
Amr Keleg
MBZUAI
M
Meriem Beloucif
Uppsala University
Yi Zhou
Yi Zhou
Cardiff University
Natural Language ProcessingComputational LinguisticsDMML
Rodrigo Agerri
Rodrigo Agerri
HiTZ Center - Ixa, University of the Basque Country UPV/EHU
Natural Language Processing
Vladimir Araujo
Vladimir Araujo
AI Research Scientist, Sailplane
Natural Language ProcessingDeep LearningContinual LearningRecommender Systems
N
Naomi Baes
University of Melbourne
James Barry
James Barry
IBM Research
Natural Language Processing
J
Joanne Boisson
Cardiff University
Nancy F. Chen
Nancy F. Chen
ISCA Fellow, AAIA Fellow, Multimodal Generative AI Group Leader, AI for Education Head at A*STAR
Agentic AILarge Language ModelsConversational AI
Christine de Kock
Christine de Kock
NLP researcher, Melbourne University
natural language processingmachine learning
A
Aleksandra Edwards
Cardiff University
J
Joseba Fernandez de Landa
HiTZ Center, University of the Basque Country EHU
M
Mohamed Fazli Imam
MBZUAI
H
Huda Hakami
Taif University
S
Shu-Kai Hsieh
National Taiwan University
J
Joseph Marvin Imperial
National University Philippines
Roy Ka-Wei Lee
Roy Ka-Wei Lee
Singapore University of Technology and Design
Trust and SafetySocial ComputingComputational Social ScienceNatural Language Processing
Zhengyuan Liu
Zhengyuan Liu
Institute for Infocomm Research (I2R) - A*STAR; IEEE Senior Member.
Natural Language ProcessingArtificial IntelligenceHuman-Centered AI
Chenyang Lyu
Chenyang Lyu
Alibaba
Large Language ModelsNatural Language ProcessingMachine Learning
Younes Samih
Younes Samih
IBM Research AI, IBM
LLMsNLPArabic NLP
Johan Sjons
Johan Sjons
PhD Student, Stockholm University
Computational linguisticsfirst language acquisitioninformation theory