🤖 AI Summary
To address the challenges of low-quality pseudo-labels and insufficient facial expression semantic modeling caused by label scarcity in semi-supervised facial expression recognition (FER), this paper proposes LEAF, a Hierarchical Decoupling and Fusion framework. LEAF jointly optimizes expression-relevant representation learning and high-fidelity pseudo-label generation: (i) a three-level expression-aware aggregation mechanism—semantic, instance, and class—is introduced; (ii) representation and prediction branches are decoupled and adaptively fused via learnable gating; and (iii) positive/negative prediction decomposition and multi-view consistency constraints are incorporated to mitigate ambiguity in pseudo-labels. Extensive experiments demonstrate that LEAF significantly outperforms state-of-the-art methods on mainstream benchmarks. Remarkably, it achieves full-supervision performance using only 10% labeled data. All components are modular and plug-and-play, readily enhancing existing semi-supervised FER frameworks.
📝 Abstract
Semi-supervised learning has emerged as a promising approach to tackle the challenge of label scarcity in facial expression recognition (FER) task. However, current state-of-the-art methods primarily focus on one side of the coin, i.e., generating high-quality pseudo-labels, while overlooking the other side: enhancing expression-relevant representations. In this paper, we unveil both sides of the coin by proposing a unified framework termed hierarchicaL dEcoupling And Fusing (LEAF) to coordinate expression-relevant representations and pseudo-labels for semi-supervised FER. LEAF introduces a hierarchical expression-aware aggregation strategy that operates at three levels: semantic, instance, and category. (1) At the semantic and instance levels, LEAF decouples representations into expression-agnostic and expression-relevant components, and adaptively fuses them using learnable gating weights. (2) At the category level, LEAF assigns ambiguous pseudo-labels by decoupling predictions into positive and negative parts, and employs a consistency loss to ensure agreement between two augmented views of the same image. Extensive experiments on benchmark datasets demonstrate that by unveiling and harmonizing both sides of the coin, LEAF outperforms state-of-the-art semi-supervised FER methods, effectively leveraging both labeled and unlabeled data. Moreover, the proposed expression-aware aggregation strategy can be seamlessly integrated into existing semi-supervised frameworks, leading to significant performance gains. Our code is available at url{https://github.com/zfkarl/LEAF}.