Knowledge-Decoupled Synergetic Learning: An MLLM based Collaborative Approach to Few-shot Multimodal Dialogue Intention Recognition

πŸ“… 2025-03-06
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
In few-shot multimodal dialogue intent recognition for e-commerce, multi-task training induces a seesaw effectβ€”inter-task knowledge interference caused by cumulative weight updates. To address this, we propose a collaborative learning framework integrating large-model post-training with small-model regularized knowledge decoupling. Our approach pioneers a knowledge-decoupling paradigm that separates the strong representation capability of multimodal large language models (MLLMs) from the interpretable rule-generation capacity of lightweight models. Specifically, it unifies MLLMs, a lightweight rule distillation network, a collaborative prediction mechanism, and a few-shot adaptive fine-tuning strategy to eliminate cross-task weight conflicts and enable positive knowledge transfer. Evaluated on a real-world Taobao dataset, our method achieves online weighted F1 improvements of 6.37% and 6.28%, significantly outperforming state-of-the-art approaches.

Technology Category

Application Category

πŸ“ Abstract
Few-shot multimodal dialogue intention recognition is a critical challenge in the e-commerce domainn. Previous methods have primarily enhanced model classification capabilities through post-training techniques. However, our analysis reveals that training for few-shot multimodal dialogue intention recognition involves two interconnected tasks, leading to a seesaw effect in multi-task learning. This phenomenon is attributed to knowledge interference stemming from the superposition of weight matrix updates during the training process. To address these challenges, we propose Knowledge-Decoupled Synergetic Learning (KDSL), which mitigates these issues by utilizing smaller models to transform knowledge into interpretable rules, while applying the post-training of larger models. By facilitating collaboration between the large and small multimodal large language models for prediction, our approach demonstrates significant improvements. Notably, we achieve outstanding results on two real Taobao datasets, with enhancements of 6.37% and 6.28% in online weighted F1 scores compared to the state-of-the-art method, thereby validating the efficacy of our framework.
Problem

Research questions and friction points this paper is trying to address.

Addresses few-shot multimodal dialogue intention recognition challenges.
Mitigates knowledge interference in multi-task learning scenarios.
Improves classification accuracy in e-commerce dialogue systems.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Knowledge-Decoupled Synergetic Learning (KDSL) introduced
Small models transform knowledge into interpretable rules
Collaboration between large and small multimodal models
πŸ”Ž Similar Papers
No similar papers found.
B
Bin Chen
University of Chinese Academy of Sciences, Hangzhou, Zhejiang, China
Y
Yu Zhang
University of Chinese Academy of Science, Hangzhou, Zhejiang, China
H
Hongfei Ye
University of Chinese Academy of Science, Hangzhou, Zhejiang, China
Ziyi Huang
Ziyi Huang
Assistant Professor @ Arizona State University
Trustworthy AI for Health
Hongyang Chen
Hongyang Chen
SUN YAT-SEN UNIVERSITY
SDNCloud ComputingMicroserviceAIOps