Incomplete Utterance Rewriting with Editing Operation Guidance and Utterance Augmentation

📅 2025-03-20
🏛️ Conference on Empirical Methods in Natural Language Processing
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of insufficient context focus, severe redundancy, and poor generalization due to scarce training data in incomplete utterance rewriting (IUR), this paper proposes EO-IUR, an edit-operation-guided multi-task learning framework. Our key contributions are: (1) a novel sequence-labeling-based supervision mechanism for edit operations, enabling token-level precise control over rewriting; (2) a two-dimensional utterance augmentation strategy integrating edit-operation-driven incomplete utterance augmentation and LLM-assisted historical dialogue augmentation; and (3) a dialogue-level heterogeneous lexical graph that models cross-turn, multi-granularity semantic relations. Evaluated on three benchmark datasets, EO-IUR significantly outperforms state-of-the-art methods, achieving consistent improvements in both rewriting accuracy and conciseness across open-domain and task-oriented dialogue scenarios.

Technology Category

Application Category

📝 Abstract
Although existing fashionable generation methods on Incomplete Utterance Rewriting (IUR) can generate coherent utterances, they often result in the inclusion of irrelevant and redundant tokens in rewritten utterances due to their inability to focus on critical tokens in dialogue context. Furthermore, the limited size of the training datasets also contributes to the insufficient training of the IUR model. To address the first issue, we propose a multi-task learning framework EO-IUR (Editing Operation-guided Incomplete Utterance Rewriting) that introduces the editing operation labels generated by sequence labeling module to guide generation model to focus on critical tokens. Furthermore, we introduce a token-level heterogeneous graph to represent dialogues. To address the second issue, we propose a two-dimensional utterance augmentation strategy, namely editing operation-based incomplete utterance augmentation and LLM-based historical utterance augmentation. The experimental results on three datasets demonstrate that our EO-IUR outperforms previous state-of-the-art (SOTA) baselines in both open-domain and task-oriented dialogue.
Problem

Research questions and friction points this paper is trying to address.

Improves Incomplete Utterance Rewriting by focusing on critical tokens.
Addresses dataset size limitations with a two-dimensional augmentation strategy.
Introduces a multi-task learning framework for better dialogue representation.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-task learning with editing operation guidance
Token-level heterogeneous graph for dialogue representation
Two-dimensional utterance augmentation strategy
🔎 Similar Papers
No similar papers found.
Z
Zhiyu Cao
School of Computer Science and Technology, Soochow University, Suzhou, China
P
Peifeng Li
School of Computer Science and Technology, Soochow University, Suzhou, China
Y
Yaxin Fan
School of Computer Science and Technology, Soochow University, Suzhou, China
Qiaoming Zhu
Qiaoming Zhu
Soochow University
Natural Language Processing