LEAF: Unveiling Two Sides of the Same Coin in Semi-supervised Facial Expression Recognition

📅 2024-04-23
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of low-quality pseudo-labels and insufficient facial expression semantic modeling caused by label scarcity in semi-supervised facial expression recognition (FER), this paper proposes LEAF, a Hierarchical Decoupling and Fusion framework. LEAF jointly optimizes expression-relevant representation learning and high-fidelity pseudo-label generation: (i) a three-level expression-aware aggregation mechanism—semantic, instance, and class—is introduced; (ii) representation and prediction branches are decoupled and adaptively fused via learnable gating; and (iii) positive/negative prediction decomposition and multi-view consistency constraints are incorporated to mitigate ambiguity in pseudo-labels. Extensive experiments demonstrate that LEAF significantly outperforms state-of-the-art methods on mainstream benchmarks. Remarkably, it achieves full-supervision performance using only 10% labeled data. All components are modular and plug-and-play, readily enhancing existing semi-supervised FER frameworks.

Technology Category

Application Category

📝 Abstract
Semi-supervised learning has emerged as a promising approach to tackle the challenge of label scarcity in facial expression recognition (FER) task. However, current state-of-the-art methods primarily focus on one side of the coin, i.e., generating high-quality pseudo-labels, while overlooking the other side: enhancing expression-relevant representations. In this paper, we unveil both sides of the coin by proposing a unified framework termed hierarchicaL dEcoupling And Fusing (LEAF) to coordinate expression-relevant representations and pseudo-labels for semi-supervised FER. LEAF introduces a hierarchical expression-aware aggregation strategy that operates at three levels: semantic, instance, and category. (1) At the semantic and instance levels, LEAF decouples representations into expression-agnostic and expression-relevant components, and adaptively fuses them using learnable gating weights. (2) At the category level, LEAF assigns ambiguous pseudo-labels by decoupling predictions into positive and negative parts, and employs a consistency loss to ensure agreement between two augmented views of the same image. Extensive experiments on benchmark datasets demonstrate that by unveiling and harmonizing both sides of the coin, LEAF outperforms state-of-the-art semi-supervised FER methods, effectively leveraging both labeled and unlabeled data. Moreover, the proposed expression-aware aggregation strategy can be seamlessly integrated into existing semi-supervised frameworks, leading to significant performance gains. Our code is available at url{https://github.com/zfkarl/LEAF}.
Problem

Research questions and friction points this paper is trying to address.

Facial Expression Recognition
Semi-supervised Learning
Pseudo-label Enhancement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pseudo-label Refinement
Multi-level Information Integration
Consistency Loss for Uncertainty Handling
🔎 Similar Papers
No similar papers found.
F
Fan Zhang
College of Big Data and Internet, Shenzhen Technology University, Shenzhen, Guangdong, China; Georgia Institute of Technology, Atlanta, Georgia, USA
Zhi-Qi Cheng
Zhi-Qi Cheng
Assistant Professor @ UW | Graduate Faculty | Ex-CMU, Google, Microsoft | Intel & IBM PhD Fellowship
multimedia processingmultimedia understandingmultimodal foundation model
Jianjun Zhao
Jianjun Zhao
Kyushu University
Software EngineeringProgramming Languages
Xiaojiang Peng
Xiaojiang Peng
Shenzhen Technology University
Computer VisionFacial Expression RecognitionMultimodal Emotion Recognition
X
Xuelong Li
Institute of Artificial Intelligence (TeleAI), China Telecom, Beijing, China; Northwestern Polytechnical University, Xi’an, Shaanxi, China