Known Intents, New Combinations: Clause-Factorized Decoding for Compositional Multi-Intent Detection

📅 2026-03-30

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This work addresses the limited generalization of existing models to unseen intent combinations—a common challenge in real-world applications. To tackle this, the authors propose ClauseCompose, a method leveraging clause-level factorized decoding that trains lightweight decoders exclusively on single-intent data yet effectively recognizes novel intent compositions. The study also introduces CoMIX-Shift, the first benchmark specifically designed for compositional generalization in multi-intent understanding, featuring controlled data construction and zero-shot triplet evaluation protocols. Experimental results demonstrate that ClauseCompose achieves a 95.7% exact match accuracy on unseen intent pairs, substantially outperforming strong baselines such as full-sentence multilabel classification and fine-tuned BERT models.

Technology Category

Application Category

📝 Abstract

Multi-intent detection papers usually ask whether a model can recover multiple intents from one utterance. We ask a harder and, for deployment, more useful question: can it recover new combinations of familiar intents? Existing benchmarks only weakly test this, because train and test often share the same broad co-occurrence patterns. We introduce CoMIX-Shift, a controlled benchmark built to stress compositional generalization in multi-intent detection through held-out intent pairs, discourse-pattern shift, longer and noisier wrappers, held-out clause templates, and zero-shot triples. We also present ClauseCompose, a lightweight decoder trained only on singleton intents, and compare it to whole-utterance baselines including a fine-tuned tiny BERT model. Across three random seeds, ClauseCompose reaches 95.7 exact match on unseen intent pairs, 93.9 on discourse-shifted pairs, 62.5 on longer/noisier pairs, 49.8 on held-out templates, and 91.1 on unseen triples. WholeMultiLabel reaches 81.4, 55.7, 18.8, 15.5, and 0.0; the BERT baseline reaches 91.5, 77.6, 48.9, 11.0, and 0.0. We also add a 240-example manually authored SNIPS-style compositional set with five held-out pairs; there, ClauseCompose reaches 97.5 exact match on unseen pairs and 86.7 under connector shift, compared with 41.3 and 10.4 for WholeMultiLabel. The results suggest that multi-intent detection needs more compositional evaluation, and that simple factorization goes surprisingly far once evaluation asks for it.

Problem

Research questions and friction points this paper is trying to address.

compositional generalization

multi-intent detection

unseen intent combinations

benchmarking

zero-shot triples

Innovation

Methods, ideas, or system contributions that make the work stand out.

compositional generalization

multi-intent detection

clause-factorized decoding