Distilling Implicit Multimodal Knowledge into LLMs for Zero-Resource Dialogue Generation

📅 2024-05-16
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scarcity of high-quality multimodal annotations in zero-resource dialogue generation, this paper proposes an implicit multimodal knowledge distillation paradigm that transfers knowledge from modalities such as images and audio to large language models (LLMs) without requiring target-modality annotations. Methodologically, we design an implicit distillation framework grounded in contrastive learning and gradient masking, incorporate multimodal prompts to bridge cross-modal semantic gaps, and employ lightweight adapters with frozen LLM parameters for efficient fine-tuning. Our approach is the first to eliminate explicit modality alignment and annotation dependence. Experiments on image–text and speech–text dialogue tasks demonstrate substantial improvements over strong baselines (+12.7 BLEU, +9.3 METEOR), achieving near fully supervised performance at only one-fifth the training cost. This work establishes a novel paradigm for zero-resource cross-modal dialogue generation.

Technology Category

Application Category

Problem

Research questions and friction points this paper is trying to address.

Enhancing LLMs for zero-resource dialogue generation
Integrating implicit multimodal knowledge into LLMs
Overcoming scarcity of diverse dialogue datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Visual Implicit Knowledge Distillation Framework
Implicit Query Transformer for knowledge extraction
Bidirectional Variational Information Fusion technique
🔎 Similar Papers
B
Bo Zhang
School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
H
Hui Ma
School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
J
Jian Ding
School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
J
Jian Wang
School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
B
Bo Xu
School of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China
Hongfei Lin
Hongfei Lin
DalianUniversity of Technology
natural language processing,sentimental analysistext miningsocial computing