🤖 AI Summary
Large language models (LLMs) exhibit limited performance on event-centric tasks requiring causal or temporal reasoning. To address this, we propose a fine-tuning-free structured prompting method that automatically converts causal event graphs into natural language statements, enabling semantic alignment between graph-structured knowledge and textual prompts. We systematically design nine prompt configurations by orthogonally combining three reasoning paradigms—zero-shot, few-shot, and chain-of-thought—and three input modalities—text-only, graph-only, and multimodal (text + graph). Evaluated on the TORQUESTRA benchmark, our approach achieves an average accuracy improvement of 5%, with gains up to 12% in zero-shot settings and a notable 18% boost using graph-augmented chain-of-thought prompting. This work constitutes the first empirical demonstration that explicitly encoding causal graph structure via natural-language prompting significantly enhances LLMs’ event reasoning capabilities.
📝 Abstract
Large language models (LLMs) excel at general language tasks but often struggle with event-based questions-especially those requiring causal or temporal reasoning. We introduce TAG-EQA (Text-And-Graph for Event Question Answering), a prompting framework that injects causal event graphs into LLM inputs by converting structured relations into natural-language statements. TAG-EQA spans nine prompting configurations, combining three strategies (zero-shot, few-shot, chain-of-thought) with three input modalities (text-only, graph-only, text+graph), enabling a systematic analysis of when and how structured knowledge aids inference. On the TORQUESTRA benchmark, TAG-EQA improves accuracy by 5% on average over text-only baselines, with gains up to 12% in zero-shot settings and 18% when graph-augmented CoT prompting is effective. While performance varies by model and configuration, our findings show that causal graphs can enhance event reasoning in LLMs without fine-tuning, offering a flexible way to encode structure in prompt-based QA.