FEALLM: Advancing Facial Emotion Analysis in Multimodal Large Language Models with Emotional Synergy and Reasoning

📅 2025-05-19

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Traditional facial emotion analysis (FEA) methods suffer from poor interpretability, limited generalization, and weak reasoning capabilities, while existing multimodal large language models (MLLMs) lack fine-grained action unit (AU) modeling. Method: We introduce FEABench—the first instruction-tuned dataset and benchmark for FEA—along with a novel emotion-coordinated modeling framework and causal reasoning mechanism to establish interpretable associations between facial expressions and AUs. Technically, our approach integrates visual encoder fine-tuning, AU-aware feature alignment, causally driven instruction tuning, and zero-shot cross-domain transfer. Results: Our method achieves state-of-the-art performance on FEABench and attains zero-shot accuracies of 89.7% (RAF-DB), 62.3% (AffectNet), 74.1% (BP4D), and 68.5% (DISFA), demonstrating superior generalization, robustness, and interpretability.

Technology Category

Application Category

📝 Abstract

Facial Emotion Analysis (FEA) plays a crucial role in visual affective computing, aiming to infer a person's emotional state based on facial data. Scientifically, facial expressions (FEs) result from the coordinated movement of facial muscles, which can be decomposed into specific action units (AUs) that provide detailed emotional insights. However, traditional methods often struggle with limited interpretability, constrained generalization and reasoning abilities. Recently, Multimodal Large Language Models (MLLMs) have shown exceptional performance in various visual tasks, while they still face significant challenges in FEA due to the lack of specialized datasets and their inability to capture the intricate relationships between FEs and AUs. To address these issues, we introduce a novel FEA Instruction Dataset that provides accurate and aligned FE and AU descriptions and establishes causal reasoning relationships between them, followed by constructing a new benchmark, FEABench. Moreover, we propose FEALLM, a novel MLLM architecture designed to capture more detailed facial information, enhancing its capability in FEA tasks. Our model demonstrates strong performance on FEABench and impressive generalization capability through zero-shot evaluation on various datasets, including RAF-DB, AffectNet, BP4D, and DISFA, showcasing its robustness and effectiveness in FEA tasks. The dataset and code will be available at https://github.com/953206211/FEALLM.

Problem

Research questions and friction points this paper is trying to address.

Enhancing facial emotion analysis in MLLMs with emotional synergy

Addressing limited interpretability and reasoning in traditional FEA methods

Overcoming lack of specialized datasets for FE and AU relationships

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces FEALLM for detailed facial emotion analysis

Develops FEA Instruction Dataset with FE-AU alignment

Proposes FEABench benchmark for robust FEA evaluation

🔎 Similar Papers

AER-LLM: Ambiguity-aware Emotion Recognition Leveraging Large Language Models