🤖 AI Summary
Event Causal Inference (ECI) frequently conflates correlation with causation, primarily due to overreliance on superficial linguistic patterns and the absence of counterfactual reasoning. To address this, we ground our approach in the Rubin Causal Model, formalizing prior events as interventions and subsequent events as outcomes. We propose the first synthetic control framework for textual ECI: leveraging text embeddings and context inversion, it constructs semantically similar yet intervention-divergent “twin counterfactual contexts” from historical corpora, enabling controlled causal effect estimation. This method overcomes the inherent limitation of uncontrollable interventions in text, substantially enhancing the rigor of causal attribution. Evaluated on the challenging COPES-hard benchmark, our approach surpasses state-of-the-art models—including GPT-4—in both accuracy and robustness, establishing new SOTA performance.
📝 Abstract
Event causality identification (ECI), a process that extracts causal relations between events from text, is crucial for distinguishing causation from correlation. Traditional approaches to ECI have primarily utilized linguistic patterns and multi-hop relational inference, risking false causality identification due to informal usage of causality and specious graphical inference. In this paper, we adopt the Rubin Causal Model to identify event causality: given two temporally ordered events, we see the first event as the treatment and the second one as the observed outcome. Determining their causality involves manipulating the treatment and estimating the resultant change in the likelihood of the outcome. Given that it is only possible to implement manipulation conceptually in the text domain, as a work-around, we try to find a twin for the protagonist from existing corpora. This twin should have identical life experiences with the protagonist before the treatment but undergoes an intervention of treatment. However, the practical difficulty of locating such a match limits its feasibility. Addressing this issue, we use the synthetic control method to generate such a twin' from relevant historical data, leveraging text embedding synthesis and inversion techniques. This approach allows us to identify causal relations more robustly than previous methods, including GPT-4, which is demonstrated on a causality benchmark, COPES-hard.