FEALLM: Advancing Facial Emotion Analysis in Multimodal Large Language Models with Emotional Synergy and Reasoning

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional facial emotion analysis (FEA) methods suffer from poor interpretability, limited generalization, and weak reasoning capabilities, while existing multimodal large language models (MLLMs) lack fine-grained action unit (AU) modeling. Method: We introduce FEABench—the first instruction-tuned dataset and benchmark for FEA—along with a novel emotion-coordinated modeling framework and causal reasoning mechanism to establish interpretable associations between facial expressions and AUs. Technically, our approach integrates visual encoder fine-tuning, AU-aware feature alignment, causally driven instruction tuning, and zero-shot cross-domain transfer. Results: Our method achieves state-of-the-art performance on FEABench and attains zero-shot accuracies of 89.7% (RAF-DB), 62.3% (AffectNet), 74.1% (BP4D), and 68.5% (DISFA), demonstrating superior generalization, robustness, and interpretability.

Technology Category

Application Category

📝 Abstract
Facial Emotion Analysis (FEA) plays a crucial role in visual affective computing, aiming to infer a person's emotional state based on facial data. Scientifically, facial expressions (FEs) result from the coordinated movement of facial muscles, which can be decomposed into specific action units (AUs) that provide detailed emotional insights. However, traditional methods often struggle with limited interpretability, constrained generalization and reasoning abilities. Recently, Multimodal Large Language Models (MLLMs) have shown exceptional performance in various visual tasks, while they still face significant challenges in FEA due to the lack of specialized datasets and their inability to capture the intricate relationships between FEs and AUs. To address these issues, we introduce a novel FEA Instruction Dataset that provides accurate and aligned FE and AU descriptions and establishes causal reasoning relationships between them, followed by constructing a new benchmark, FEABench. Moreover, we propose FEALLM, a novel MLLM architecture designed to capture more detailed facial information, enhancing its capability in FEA tasks. Our model demonstrates strong performance on FEABench and impressive generalization capability through zero-shot evaluation on various datasets, including RAF-DB, AffectNet, BP4D, and DISFA, showcasing its robustness and effectiveness in FEA tasks. The dataset and code will be available at https://github.com/953206211/FEALLM.
Problem

Research questions and friction points this paper is trying to address.

Enhancing facial emotion analysis in MLLMs with emotional synergy
Addressing limited interpretability and reasoning in traditional FEA methods
Overcoming lack of specialized datasets for FE and AU relationships
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces FEALLM for detailed facial emotion analysis
Develops FEA Instruction Dataset with FE-AU alignment
Proposes FEABench benchmark for robust FEA evaluation
🔎 Similar Papers
No similar papers found.
Z
Zhuozhao Hu
Tianjin University
K
Kaishen Yuan
The Hong Kong University of Science and Technology (Guangzhou)
X
Xin Liu
Lappeenranta-Lahti University of Technology LUT
Zitong Yu
Zitong Yu
U.S. Food and Drug Administration
Medical imagingDeep learningMachine learningImage reconstruction
Yuan Zong
Yuan Zong
Southeast University
Affective ComputingMedical Artificial IntelligenceDigital Mental Health
Jingang Shi
Jingang Shi
Xi'an Jiaotong University
computer visionface analysisimage restorationphysiological signal analysis
H
Huanjing Yue
Tianjin University
J
Jingyu Yang
Tianjin University