MICACL: Multi-Instance Category-Aware Contrastive Learning for Long-Tailed Dynamic Facial Expression Recognition

📅 2025-09-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Dynamic facial expression recognition (DFER) faces two key challenges: model bias induced by long-tailed class distributions and the complexity of spatiotemporal feature modeling. To address these, we propose a multi-instance class-aware contrastive learning framework. First, a graph-enhanced instance interaction module models dynamic relationships among video clips via an adaptive adjacency matrix. Second, a weighted instance aggregation network enables importance-aware spatiotemporal feature fusion. Third, a multi-scale class-aware contrastive learning mechanism—integrating class-aware sampling and multi-scale convolutions—mitigates training imbalance. By deeply integrating graph neural networks with attention mechanisms, our method achieves state-of-the-art performance on DFEW and FERV39k benchmarks. It significantly improves minority-class accuracy, model robustness, and generalization capability, demonstrating superior effectiveness in handling long-tailed DFER tasks.

Technology Category

Application Category

📝 Abstract
Dynamic facial expression recognition (DFER) faces significant challenges due to long-tailed category distributions and complexity of spatio-temporal feature modeling. While existing deep learning-based methods have improved DFER performance, they often fail to address these issues, resulting in severe model induction bias. To overcome these limitations, we propose a novel multi-instance learning framework called MICACL, which integrates spatio-temporal dependency modeling and long-tailed contrastive learning optimization. Specifically, we design the Graph-Enhanced Instance Interaction Module (GEIIM) to capture intricate spatio-temporal between adjacent instances relationships through adaptive adjacency matrices and multiscale convolutions. To enhance instance-level feature aggregation, we develop the Weighted Instance Aggregation Network (WIAN), which dynamically assigns weights based on instance importance. Furthermore, we introduce a Multiscale Category-aware Contrastive Learning (MCCL) strategy to balance training between major and minor categories. Extensive experiments on in-the-wild datasets (i.e., DFEW and FERV39k) demonstrate that MICACL achieves state-of-the-art performance with superior robustness and generalization.
Problem

Research questions and friction points this paper is trying to address.

Addresses long-tailed category distributions in dynamic facial expression recognition
Overcomes spatio-temporal feature modeling complexity in expression analysis
Reduces model induction bias through contrastive learning optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-Enhanced Instance Interaction Module captures spatio-temporal dependencies
Weighted Instance Aggregation Network dynamically assigns instance importance
Multiscale Category-aware Contrastive Learning balances major and minor categories
🔎 Similar Papers
Feng-Qi Cui
Feng-Qi Cui
University of Science and Technology of China
MultimediaTrustworthy AILLMAI4S
Z
Zhen Lin
Hefei University of Technology, Hefei, China
X
Xinlong Rao
University of Science and Technology of China, Hefei, China
Anyang Tong
Anyang Tong
Hefei University of Technology
Shiyao Li
Shiyao Li
PhD student at Imagine (IP Paris) and Willow (Inria)
Computer VisionDeep LearningRobotics
F
Fei Wang
Hefei University of Technology, Hefei, China
C
Changlin Chen
University of Science and Technology of China, Hefei, China
B
Bin Liu
University of Science and Technology of China, Hefei, China