🤖 AI Summary
This work investigates the necessity and functional role of triggers in document-level event extraction. We systematically evaluate trigger quality—comparing human-annotated, LLM-generated, keyword-based, and random triggers—within both end-to-end and pipeline architectures. Contrary to conventional assumptions, our experiments demonstrate that triggers are not strictly indispensable: basic automatically generated triggers match human-annotated ones in performance, and surprisingly, random triggers even improve F1 by +3.2% in prompt-based LLM approaches. To mitigate performance degradation under low-quality triggers, we propose an event description enhancement strategy that significantly boosts model robustness. Extensive evaluation across AIDA, MAVEN-Doc, and RichERE benchmarks integrates neural models, prompt engineering, and multi-strategy trigger injection. Our findings establish a new paradigm for understanding trigger roles and designing practical event extraction systems.
📝 Abstract
Most existing work on event extraction has focused on sentence-level texts and presumes the identification of a trigger-span -- a word or phrase in the input that evokes the occurrence of an event of interest. Event arguments are then extracted with respect to the trigger. Indeed, triggers are treated as integral to, and trigger detection as an essential component of, event extraction. In this paper, we provide the first investigation of the role of triggers for the more difficult and much less studied task of document-level event extraction. We analyze their usefulness in multiple end-to-end and pipelined neural event extraction models for three document-level event extraction datasets, measuring performance using triggers of varying quality (human-annotated, LLM-generated, keyword-based, and random). Our research shows that trigger effectiveness varies based on the extraction task's characteristics and data quality, with basic, automatically-generated triggers serving as a viable alternative to human-annotated ones. Furthermore, providing detailed event descriptions to the extraction model helps maintain robust performance even when trigger quality degrades. Perhaps surprisingly, we also find that the mere existence of trigger input, even random ones, is important for prompt-based LLM approaches to the task.