Cascading Large Language Models for Salient Event Graph Generation

📅 2024-06-26
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Long-document event graph generation faces multiple challenges, including event detection, relation identification, and structural alignment, while existing methods neglect event saliency disparities. This paper proposes CALLMSAE—the first unsupervised framework for automatic salient event graph generation—leveraging summary-driven filtering to identify narrative-critical events and integrating iterative LLM-based code generation and refinement to construct high-precision event relation graphs. To support this research, we introduce NYT-SEG, the first large-scale automatically annotated event graph dataset, built via a hybrid approach combining distant supervision and LLM prompt engineering for high-quality graph annotation. Experiments demonstrate that CALLMSAE significantly outperforms mainstream baselines on a human-annotated test set. Furthermore, when fine-tuned on NYT-SEG, our model surpasses the supervised state-of-the-art method CAEVO, validating both the efficacy of saliency-aware modeling and the utility of our novel dataset.

Technology Category

Application Category

📝 Abstract
Generating event graphs from long documents is challenging due to the inherent complexity of multiple tasks involved such as detecting events, identifying their relationships, and reconciling unstructured input with structured graphs. Recent studies typically consider all events with equal importance, failing to distinguish salient events crucial for understanding narratives. This paper presents CALLMSAE, a CAscading Large Language Model framework for SAlient Event graph generation, which leverages the capabilities of LLMs and eliminates the need for costly human annotations. We first identify salient events by prompting LLMs to generate summaries, from which salient events are identified. Next, we develop an iterative code refinement prompting strategy to generate event relation graphs, removing hallucinated relations and recovering missing edges. Powered by CALLMSAE, we present extit{NYT-SEG}, a large-scale automatically annotated event graph dataset which can serve as distant supervision signals. Fine-tuning contextualised graph generation models on extit{NYT-SEG} outperforms the models trained on CAEVO data. Results on a human-annotated test set show that the proposed method generates salient and more accurate graphs, outperforming competitive baselines.
Problem

Research questions and friction points this paper is trying to address.

Generate salient event graphs from long documents.
Leverage LLMs to eliminate costly human annotations.
Improve event graph accuracy and salience detection.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages cascading LLMs for event graphs
Uses iterative code refinement prompting
Generates large-scale annotated event dataset
🔎 Similar Papers
No similar papers found.