PSA-MF: Personality-Sentiment Aligned Multi-Level Fusion for Multimodal Sentiment Analysis

📅 2025-12-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multimodal sentiment analysis (MSA) faces two key challenges: (1) unimodal feature extraction overlooks individual personality differences, resulting in coarse-grained sentiment representations; and (2) existing multimodal fusion methods fail to model inter-modal feature heterogeneity, undermining discriminative capability. To address these, we propose the Personality-Enhanced Alignment Framework (PEAF), the first MSA approach to explicitly incorporate personality features. PEAF introduces a personality-sentiment alignment network to achieve cross-modal semantic calibration. Furthermore, it establishes a three-level fusion mechanism—comprising pre-fusion, mid-level alignment, and high-level enhancement—to enable fine-grained, hierarchical feature interaction. Extensive experiments on CMU-MOSEI and CH-SIMS demonstrate that PEAF significantly outperforms state-of-the-art methods, validating the effectiveness of personality-guided multimodal sentiment understanding.

Technology Category

Application Category

📝 Abstract
Multimodal sentiment analysis (MSA) is a research field that recognizes human sentiments by combining textual, visual, and audio modalities. The main challenge lies in integrating sentiment-related information from different modalities, which typically arises during the unimodal feature extraction phase and the multimodal feature fusion phase. Existing methods extract only shallow information from unimodal features during the extraction phase, neglecting sentimental differences across different personalities. During the fusion phase, they directly merge the feature information from each modality without considering differences at the feature level. This ultimately affects the model's recognition performance. To address this problem, we propose a personality-sentiment aligned multi-level fusion framework. We introduce personality traits during the feature extraction phase and propose a novel personality-sentiment alignment method to obtain personalized sentiment embeddings from the textual modality for the first time. In the fusion phase, we introduce a novel multi-level fusion method. This method gradually integrates sentimental information from textual, visual, and audio modalities through multimodal pre-fusion and a multi-level enhanced fusion strategy. Our method has been evaluated through multiple experiments on two commonly used datasets, achieving state-of-the-art results.
Problem

Research questions and friction points this paper is trying to address.

Integrates sentiment across text, visual, and audio modalities
Addresses personality differences in sentiment feature extraction
Enhances multimodal fusion with multi-level strategies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces personality traits in feature extraction phase
Proposes personality-sentiment alignment for personalized embeddings
Uses multi-level fusion with pre-fusion and enhanced strategies
🔎 Similar Papers
No similar papers found.
H
Heng Xie
School of Computer Science & Technology, Beijing Institute of Technology, Beijing, China
K
Kang Zhu
School of Computer Science, Wuhan University, Wuhan, China
Zhengqi Wen
Zhengqi Wen
Tshinghua University
LLM
J
Jianhua Tao
Beijing National Research Center for Information Science and Technology, Tsinghua University, Beijing, China; Department of Automation, Tsinghua University, Beijing, China
X
Xuefei Liu
College of Computer and Information Engineering, Tianjin Normal University, Tianjin, China
Ruibo Fu
Ruibo Fu
Associate Professor,CASIA
AIGCLMMIntelligent speech interactionDeepfake detection
Changsheng Li
Changsheng Li
Beijing Institute of Technology
Flexible roboticsMechanical DesignRoboticsMedical RoboticsSurgical Robotics