How DDAIR you? Disambiguated Data Augmentation for Intent Recognition

📅 2026-01-16

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

This work addresses the degradation of classification performance in low-resource intent recognition tasks caused by semantically ambiguous samples generated by large language models. To mitigate this issue, the authors propose the DDAIR framework, which introduces a semantic similarity mechanism based on Sentence Transformers to detect and filter cross-category ambiguous samples. Furthermore, an iterative regeneration strategy is employed to enhance the class discriminability of the synthesized data. By explicitly resolving intent boundary ambiguity, the proposed approach significantly improves intent recognition accuracy under low-resource conditions.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are effective for data augmentation in classification tasks like intent detection. In some cases, they inadvertently produce examples that are ambiguous with regard to untargeted classes. We present DDAIR (Disambiguated Data Augmentation for Intent Recognition) to mitigate this problem. We use Sentence Transformers to detect ambiguous class-guided augmented examples generated by LLMs for intent recognition in low-resource scenarios. We identify synthetic examples that are semantically more similar to another intent than to their target one. We also provide an iterative re-generation method to mitigate such ambiguities. Our findings show that sentence embeddings effectively help to (re)generate less ambiguous examples, and suggest promising potential to improve classification performance in scenarios where intents are loosely or broadly defined.

Problem

Research questions and friction points this paper is trying to address.

data augmentation

intent recognition

semantic ambiguity

large language models

low-resource scenarios

Innovation

Methods, ideas, or system contributions that make the work stand out.

Disambiguated Data Augmentation

Intent Recognition

Large Language Models