DTO: a Differentiable Training Objective for Effective Counterfactual Story Rewriting

📅 2026-05-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge in counterfactual story rewriting, where existing models struggle to perform precise local edits without disrupting unaffected plot elements or overall narrative coherence. To overcome this, the authors propose a differentiable training objective (DTO) with a tailored loss function that jointly optimizes fidelity to the reference rewrite and semantic consistency with the original narrative within an end-to-end Transformer framework. This approach avoids the complexity of reinforcement learning and mitigates the limitation of maximum likelihood estimation, which often overlooks fine-grained modifications. Experimental results demonstrate that the proposed method consistently outperforms both maximum likelihood and preference learning baselines on the TimeTravel and ART datasets, achieving performance comparable to or better than current state-of-the-art large language models.
📝 Abstract
Counterfactual story rewriting is a natural language processing task that requires updating an existing story to reflect a chosen alternative event, yet preserving all the unaffected storyline elements and overall coherence. While large language models have recently made remarkable progress on this task, it still remains challenging since the required modifications are typically very small in size and highly localized. As a consequence, models trained in a conventional manner with the maximum-likelihood training objective tend to overlook these nuances. At the same time, more sophisticated training approaches based on reinforcement learning are notoriously slow and difficult to set up. For these reasons, our paper proposes a novel, differentiable training objective (DTO) that directly optimizes for the requisite counterfactual improvements. In our approach, a transformer model is fine-tuned via end-to-end backpropagation against a fully differentiable loss function that jointly rewards (i) fidelity to the reference rewrite and (ii) semantic consistency with the source narrative. The empirical evaluation on the TimeTravel and ART datasets shows that the proposed DTO approach has been able to surpass a maximum-likelihood baseline and a preference-based approach, and perform competitively against two contemporary large language models in all evaluation metrics. These findings substantiate the effectiveness of task-specific differentiable objectives for nuanced, controlled text-generation tasks.
Problem

Research questions and friction points this paper is trying to address.

counterfactual story rewriting
differentiable training objective
text generation
narrative coherence
localized modification
Innovation

Methods, ideas, or system contributions that make the work stand out.

differentiable training objective
counterfactual story rewriting
controlled text generation
semantic consistency
end-to-end fine-tuning
🔎 Similar Papers
2024-08-21BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLPCitations: 1