MICACL: Multi-Instance Category-Aware Contrastive Learning for Long-Tailed Dynamic Facial Expression Recognition

📅 2025-09-04

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Dynamic facial expression recognition (DFER) faces two key challenges: model bias induced by long-tailed class distributions and the complexity of spatiotemporal feature modeling. To address these, we propose a multi-instance class-aware contrastive learning framework. First, a graph-enhanced instance interaction module models dynamic relationships among video clips via an adaptive adjacency matrix. Second, a weighted instance aggregation network enables importance-aware spatiotemporal feature fusion. Third, a multi-scale class-aware contrastive learning mechanism—integrating class-aware sampling and multi-scale convolutions—mitigates training imbalance. By deeply integrating graph neural networks with attention mechanisms, our method achieves state-of-the-art performance on DFEW and FERV39k benchmarks. It significantly improves minority-class accuracy, model robustness, and generalization capability, demonstrating superior effectiveness in handling long-tailed DFER tasks.

Technology Category

Application Category

📝 Abstract

Dynamic facial expression recognition (DFER) faces significant challenges due to long-tailed category distributions and complexity of spatio-temporal feature modeling. While existing deep learning-based methods have improved DFER performance, they often fail to address these issues, resulting in severe model induction bias. To overcome these limitations, we propose a novel multi-instance learning framework called MICACL, which integrates spatio-temporal dependency modeling and long-tailed contrastive learning optimization. Specifically, we design the Graph-Enhanced Instance Interaction Module (GEIIM) to capture intricate spatio-temporal between adjacent instances relationships through adaptive adjacency matrices and multiscale convolutions. To enhance instance-level feature aggregation, we develop the Weighted Instance Aggregation Network (WIAN), which dynamically assigns weights based on instance importance. Furthermore, we introduce a Multiscale Category-aware Contrastive Learning (MCCL) strategy to balance training between major and minor categories. Extensive experiments on in-the-wild datasets (i.e., DFEW and FERV39k) demonstrate that MICACL achieves state-of-the-art performance with superior robustness and generalization.

Problem

Research questions and friction points this paper is trying to address.

Addresses long-tailed category distributions in dynamic facial expression recognition

Overcomes spatio-temporal feature modeling complexity in expression analysis

Reduces model induction bias through contrastive learning optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-Enhanced Instance Interaction Module captures spatio-temporal dependencies

Weighted Instance Aggregation Network dynamically assigns instance importance

Multiscale Category-aware Contrastive Learning balances major and minor categories

🔎 Similar Papers

Rethinking the Learning Paradigm for Facial Expression Recognition