🤖 AI Summary
Ontology relation construction in task-oriented dialogue systems heavily relies on manual curation, resulting in poor generalizability and scalability.
Method: This paper proposes a generative ontology relation extraction method for unlabeled dialogue data. It introduces chain-of-thought (CoT) decoding to this task for the first time and designs a constrained multi-branch decoding mechanism that integrates prior knowledge from ontology term and relation spaces, coupled with confidence-threshold-based filtering to substantially mitigate large language model hallucination. The method supports both source-domain fine-tuning and few-shot prompting paradigms.
Contribution/Results: On two benchmark datasets, our approach achieves significant improvements in ontology relation extraction F1 score. Moreover, it demonstrates robust cross-domain transfer performance, sustaining consistent gains under domain shift. This advances the automation and scalability of ontology construction for task-oriented dialogue systems.
📝 Abstract
State-of-the-art task-oriented dialogue systems typically rely on task-specific ontologies for fulfilling user queries. The majority of task-oriented dialogue data, such as customer service recordings, comes without ontology and annotation. Such ontologies are normally built manually, limiting the application of specialised systems. Dialogue ontology construction is an approach for automating that process and typically consists of two steps: term extraction and relation extraction. In this work, we focus on relation extraction in a transfer learning set-up. To improve the generalisation, we propose an extension to the decoding mechanism of large language models. We adapt Chain-of-Thought (CoT) decoding, recently developed for reasoning problems, to generative relation extraction. Here, we generate multiple branches in the decoding space and select the relations based on a confidence threshold. By constraining the decoding to ontology terms and relations, we aim to decrease the risk of hallucination. We conduct extensive experimentation on two widely used datasets and find improvements in performance on target ontology for source fine-tuned and one-shot prompted large language models.