Resource-Limited Joint Multimodal Sentiment Reasoning and Classification via Chain-of-Thought Enhancement and Distillation

📅 2025-08-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of jointly achieving interpretability and efficiency in multimodal sentiment reasoning and classification under resource-constrained settings, this paper proposes the first lightweight joint modeling paradigm: simultaneously generating interpretable sentiment reasoning chains and performing fine-grained classification using a 3B-parameter model. We innovatively design a three-tier “teacher–assistant–student” knowledge distillation framework: a multimodal large language model serves as the teacher to construct high-quality reasoning data and supervise an intermediate assistant model; the student model is then optimized via multi-task learning and hierarchical distillation. Evaluated on four benchmark datasets, our method significantly outperforms existing lightweight models—achieving +8.2 BLEU-4 gain in reasoning chain quality and +2.7% average accuracy improvement in sentiment classification—while demonstrating strong generalization and efficient deployability.

Technology Category

Application Category

📝 Abstract
The surge in rich multimodal content on social media platforms has greatly advanced Multimodal Sentiment Analysis (MSA), with Large Language Models (LLMs) further accelerating progress in this field. Current approaches primarily leverage the knowledge and reasoning capabilities of parameter-heavy (Multimodal) LLMs for sentiment classification, overlooking autonomous multimodal sentiment reasoning generation in resource-constrained environments. Therefore, we focus on the Resource-Limited Joint Multimodal Sentiment Reasoning and Classification task, JMSRC, which simultaneously performs multimodal sentiment reasoning chain generation and sentiment classification only with a lightweight model. We propose a Multimodal Chain-of-Thought Reasoning Distillation model, MulCoT-RD, designed for JMSRC that employs a "Teacher-Assistant-Student" distillation paradigm to address deployment constraints in resource-limited environments. We first leverage a high-performance Multimodal Large Language Model (MLLM) to generate the initial reasoning dataset and train a medium-sized assistant model with a multi-task learning mechanism. A lightweight student model is jointly trained to perform efficient multimodal sentiment reasoning generation and classification. Extensive experiments on four datasets demonstrate that MulCoT-RD with only 3B parameters achieves strong performance on JMSRC, while exhibiting robust generalization and enhanced interpretability.
Problem

Research questions and friction points this paper is trying to address.

Enabling autonomous multimodal sentiment reasoning in resource-limited settings
Jointly performing sentiment reasoning and classification with lightweight models
Addressing deployment constraints via distillation for efficient multimodal analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight model for sentiment reasoning and classification
Teacher-Assistant-Student distillation paradigm
Multimodal Chain-of-Thought Reasoning Distillation
🔎 Similar Papers
No similar papers found.
H
Haonan Shangguan
School of Computer Science and Engineering, Northeastern University, Shenyang, China
Xiaocui Yang
Xiaocui Yang
Lecturer, Northeastern University (China)
Multimodal Sentiment AnalysisData MiningMultimodal Large Language Models
S
Shi Feng
School of Computer Science and Engineering, Northeastern University, Shenyang, China
D
Daling Wang
School of Computer Science and Engineering, Northeastern University, Shenyang, China
Y
Yifei Zhang
School of Computer Science and Engineering, Northeastern University, Shenyang, China
G
Ge Yu
School of Computer Science and Engineering, Northeastern University, Shenyang, China