LEMONADE: A Large Multilingual Expert-Annotated Abstractive Event Dataset for the Real World

📅 2025-06-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional span-based extraction methods struggle to model cross-lingual event arguments and entities in multilingual event analysis. Method: This paper introduces two novel paradigms—Abstract Event Extraction (AEE) and Abstract Entity Linking (AEL)—shifting from local span extraction to global document understanding. We construct the first expert-annotated, multilingual-aligned, entity-aware real-world benchmark dataset covering 20 languages, 171 countries, and 39,786 conflict events. We formally define and implement the cross-lingual abstract event extraction and linking task, and propose ZEST, a zero-shot retrieval-based AEL system. Contribution/Results: Experiments show that our end-to-end zero-shot AEE achieves 58.3% F1—outperforming GoLLIE; ZEST attains 45.7% F1 on AEL, improving upon OneNet by 22.0 points. The dataset, evaluation protocol, and code are publicly released.

Technology Category

Application Category

📝 Abstract
This paper presents LEMONADE, a large-scale conflict event dataset comprising 39,786 events across 20 languages and 171 countries, with extensive coverage of region-specific entities. LEMONADE is based on a partially reannotated subset of the Armed Conflict Location&Event Data (ACLED), which has documented global conflict events for over a decade. To address the challenge of aggregating multilingual sources for global event analysis, we introduce abstractive event extraction (AEE) and its subtask, abstractive entity linking (AEL). Unlike conventional span-based event extraction, our approach detects event arguments and entities through holistic document understanding and normalizes them across the multilingual dataset. We evaluate various large language models (LLMs) on these tasks, adapt existing zero-shot event extraction systems, and benchmark supervised models. Additionally, we introduce ZEST, a novel zero-shot retrieval-based system for AEL. Our best zero-shot system achieves an end-to-end F1 score of 58.3%, with LLMs outperforming specialized event extraction models such as GoLLIE. For entity linking, ZEST achieves an F1 score of 45.7%, significantly surpassing OneNet, a state-of-the-art zero-shot baseline that achieves only 23.7%. However, these zero-shot results lag behind the best supervised systems by 20.1% and 37.0% in the end-to-end and AEL tasks, respectively, highlighting the need for further research.
Problem

Research questions and friction points this paper is trying to address.

Aggregating multilingual sources for global event analysis.
Detecting event arguments via holistic document understanding.
Normalizing entities across a multilingual event dataset.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Abstractive event extraction for multilingual analysis
Zero-shot retrieval-based system for entity linking
Evaluation of LLMs on event extraction tasks
🔎 Similar Papers
No similar papers found.