Knowledge-Decoupled Synergetic Learning: An MLLM based Collaborative Approach to Few-shot Multimodal Dialogue Intention Recognition

📅 2025-03-06

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

In few-shot multimodal dialogue intent recognition for e-commerce, multi-task training induces a seesaw effect—inter-task knowledge interference caused by cumulative weight updates. To address this, we propose a collaborative learning framework integrating large-model post-training with small-model regularized knowledge decoupling. Our approach pioneers a knowledge-decoupling paradigm that separates the strong representation capability of multimodal large language models (MLLMs) from the interpretable rule-generation capacity of lightweight models. Specifically, it unifies MLLMs, a lightweight rule distillation network, a collaborative prediction mechanism, and a few-shot adaptive fine-tuning strategy to eliminate cross-task weight conflicts and enable positive knowledge transfer. Evaluated on a real-world Taobao dataset, our method achieves online weighted F1 improvements of 6.37% and 6.28%, significantly outperforming state-of-the-art approaches.

Technology Category

Application Category

📝 Abstract

Few-shot multimodal dialogue intention recognition is a critical challenge in the e-commerce domainn. Previous methods have primarily enhanced model classification capabilities through post-training techniques. However, our analysis reveals that training for few-shot multimodal dialogue intention recognition involves two interconnected tasks, leading to a seesaw effect in multi-task learning. This phenomenon is attributed to knowledge interference stemming from the superposition of weight matrix updates during the training process. To address these challenges, we propose Knowledge-Decoupled Synergetic Learning (KDSL), which mitigates these issues by utilizing smaller models to transform knowledge into interpretable rules, while applying the post-training of larger models. By facilitating collaboration between the large and small multimodal large language models for prediction, our approach demonstrates significant improvements. Notably, we achieve outstanding results on two real Taobao datasets, with enhancements of 6.37% and 6.28% in online weighted F1 scores compared to the state-of-the-art method, thereby validating the efficacy of our framework.

Problem

Research questions and friction points this paper is trying to address.

Addresses few-shot multimodal dialogue intention recognition challenges.

Mitigates knowledge interference in multi-task learning scenarios.

Improves classification accuracy in e-commerce dialogue systems.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Knowledge-Decoupled Synergetic Learning (KDSL) introduced

Small models transform knowledge into interpretable rules

Collaboration between large and small multimodal models

🔎 Similar Papers

No similar papers found.

Authors to Follow