More Is Better: A MoE-Based Emotion Recognition Framework with Human Preference Alignment

📅 2025-08-08

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

To address the dual challenges of label scarcity and misalignment with human preferences in semi-supervised sentiment recognition, this paper proposes a multimodal Mixture-of-Experts (MoE) framework. It models visual, textual, and action unit modalities as independent experts and—novelty—enhances expert capabilities via knowledge distillation from the large vision-language model Gemini. A consensus-driven pseudo-labeling mechanism selects high-quality unlabeled samples based on agreement between baseline model predictions and Gemini outputs. Furthermore, multi-expert ensemble voting coupled with rule-based re-ranking mitigates bias and improves alignment with human preferences. The method employs a two-stage training strategy. Evaluated on the MER2025-SEMI challenge, it achieves an F1 score of 0.8772 (ranked second), demonstrating substantial improvements in generalization under semi-supervised settings and consistency with human evaluations.

Technology Category

Application Category

📝 Abstract

In this paper, we present our solution for the semi-supervised learning track (MER-SEMI) in MER2025. We propose a comprehensive framework, grounded in the principle that "more is better," to construct a robust Mixture of Experts (MoE) emotion recognition system. Our approach integrates a diverse range of input modalities as independent experts, including novel signals such as knowledge from large Vision-Language Models (VLMs) and temporal Action Unit (AU) information. To effectively utilize unlabeled data, we introduce a consensus-based pseudo-labeling strategy, generating high-quality labels from the agreement between a baseline model and Gemini, which are then used in a two-stage training paradigm. Finally, we employ a multi-expert voting ensemble combined with a rule-based re-ranking process to correct prediction bias and better align the outputs with human preferences. Evaluated on the MER2025-SEMI challenge dataset, our method achieves an F1-score of 0.8772 on the test set, ranking 2nd in the track. Our code is available at https://github.com/zhuyjan/MER2025-MRAC25.

Problem

Research questions and friction points this paper is trying to address.

Enhancing emotion recognition with diverse input modalities

Improving pseudo-labeling quality for semi-supervised learning

Aligning model predictions with human preferences

Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture of Experts integrates diverse input modalities

Consensus-based pseudo-labeling for unlabeled data

Multi-expert voting with rule-based re-ranking

🔎 Similar Papers

OV-MER: Towards Open-Vocabulary Multimodal Emotion Recognition