🤖 AI Summary
This work addresses the challenge that current large language models struggle to accurately infer direct causal relationships between events in real-world scenarios rich with evidential text. To this end, the paper introduces the first systematic benchmark for abductive causal reasoning grounded in multi-document evidence, requiring models to identify the most plausible direct cause of a target event from dispersed, noisy, and multi-source textual inputs. The task explicitly incorporates core challenges such as evidence integration, filtering of indirect contextual information, and mitigation of semantic interference, and is formalized as a multiple-choice question-answering framework for standardized evaluation. Upon its release, the benchmark attracted 518 submissions from 122 teams, establishing a high-quality platform for evaluating and advancing research in event-level causal reasoning and multi-document comprehension.
📝 Abstract
Understanding why real-world events occur is important for both natural language processing and practical decision-making, yet direct-cause inference remains underexplored in evidence-rich settings. To address this gap, we organized SemEval-2026 Task 12: Abductive Event Reasoning (AER).\footnote{The task data is available at https://github.com/sooo66/semeval2026-task12-dataset.git} The task asks systems to identify the most plausible direct cause of a target event from supporting evidence. We formulate AER as an evidence-grounded multiple-choice benchmark that captures key challenges of real-world causal reasoning, including distributed evidence, indirect background factors, and semantically related but non-causal distractors. The shared task attracted 122 participants and received 518 submissions. This paper presents the task formulation, dataset construction pipeline, evaluation setup, and system results. AER provides a focused benchmark for abductive reasoning over real-world events and highlights challenges for future work on causal reasoning and multi-document understanding.