More Is Better: A MoE-Based Emotion Recognition Framework with Human Preference Alignment

📅 2025-08-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the dual challenges of label scarcity and misalignment with human preferences in semi-supervised sentiment recognition, this paper proposes a multimodal Mixture-of-Experts (MoE) framework. It models visual, textual, and action unit modalities as independent experts and—novelty—enhances expert capabilities via knowledge distillation from the large vision-language model Gemini. A consensus-driven pseudo-labeling mechanism selects high-quality unlabeled samples based on agreement between baseline model predictions and Gemini outputs. Furthermore, multi-expert ensemble voting coupled with rule-based re-ranking mitigates bias and improves alignment with human preferences. The method employs a two-stage training strategy. Evaluated on the MER2025-SEMI challenge, it achieves an F1 score of 0.8772 (ranked second), demonstrating substantial improvements in generalization under semi-supervised settings and consistency with human evaluations.

Technology Category

Application Category

📝 Abstract
In this paper, we present our solution for the semi-supervised learning track (MER-SEMI) in MER2025. We propose a comprehensive framework, grounded in the principle that "more is better," to construct a robust Mixture of Experts (MoE) emotion recognition system. Our approach integrates a diverse range of input modalities as independent experts, including novel signals such as knowledge from large Vision-Language Models (VLMs) and temporal Action Unit (AU) information. To effectively utilize unlabeled data, we introduce a consensus-based pseudo-labeling strategy, generating high-quality labels from the agreement between a baseline model and Gemini, which are then used in a two-stage training paradigm. Finally, we employ a multi-expert voting ensemble combined with a rule-based re-ranking process to correct prediction bias and better align the outputs with human preferences. Evaluated on the MER2025-SEMI challenge dataset, our method achieves an F1-score of 0.8772 on the test set, ranking 2nd in the track. Our code is available at https://github.com/zhuyjan/MER2025-MRAC25.
Problem

Research questions and friction points this paper is trying to address.

Enhancing emotion recognition with diverse input modalities
Improving pseudo-labeling quality for semi-supervised learning
Aligning model predictions with human preferences
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture of Experts integrates diverse input modalities
Consensus-based pseudo-labeling for unlabeled data
Multi-expert voting with rule-based re-ranking
🔎 Similar Papers
No similar papers found.
J
Jun Xie
Lenovo Research, Beijing, China
Y
Yingjian Zhu
Institute of Automation, Chinese Academy of Sciences, Beijing, China
F
Feng Chen
Lenovo Research, Beijing, China
Zhenghao Zhang
Zhenghao Zhang
Florida State University
communication networks
Xiaohui Fan
Xiaohui Fan
Tsinghua University, Beijing, China
H
Hongzhu Yi
University of Chinese Academy of Sciences, Beijing, China
X
Xinming Wang
Institute of Automation, Chinese Academy of Sciences, Beijing, China
C
Chen Yu
Beijing Jiaotong University, Beijing, China
Y
Yue Bi
Shandong University, Beijing, China
Z
Zhaoran Zhao
Lenovo Research, Beijing, China
X
Xiongjun Guan
Tsinghua University, Beijing, China
Zhepeng Wang
Zhepeng Wang
Applied Scientist at Amazon Stores Foundational AI
Large Language ModelsOn-device AISelf-supervised LearningQuantum Machine Learning