Point-Supervised Facial Expression Spotting with Gaussian-Based Instance-Adaptive Intensity Modeling

📅 2025-11-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the high annotation cost in automatic facial expression recognition (FER) caused by reliance on dense temporal boundary labels. We propose point-supervised facial expression spotting (P-FES), which requires only a single timestamp per expression instance. Methodologically, we design a dual-branch network: a backbone branch employs Gaussian-based adaptive instance modeling to generate intensity-aware soft pseudo-labels, while a class-aware vertex classification branch explicitly decouples macro-/micro-expression recognition from intensity estimation. Additionally, an intensity-aware contrastive loss is introduced to suppress interference from neutral frames. To our knowledge, this is the first work to disentangle intensity modeling and classification under point-level supervision. Extensive experiments on SAMM-LV, CAS(ME)², and CAS(ME)³ demonstrate that P-FES significantly outperforms existing weakly supervised methods, validating its effectiveness and robustness.

Technology Category

Application Category

📝 Abstract
Automatic facial expression spotting, which aims to identify facial expression instances in untrimmed videos, is crucial for facial expression analysis. Existing methods primarily focus on fully-supervised learning and rely on costly, time-consuming temporal boundary annotations. In this paper, we investigate point-supervised facial expression spotting (P-FES), where only a single timestamp annotation per instance is required for training. We propose a unique two-branch framework for P-FES. First, to mitigate the limitation of hard pseudo-labeling, which often confuses neutral and expression frames with various intensities, we propose a Gaussian-based instance-adaptive intensity modeling (GIM) module to model instance-level expression intensity distribution for soft pseudo-labeling. By detecting the pseudo-apex frame around each point label, estimating the duration, and constructing an instance-level Gaussian distribution, GIM assigns soft pseudo-labels to expression frames for more reliable intensity supervision. The GIM module is incorporated into our framework to optimize the class-agnostic expression intensity branch. Second, we design a class-aware apex classification branch that distinguishes macro- and micro-expressions solely based on their pseudo-apex frames. During inference, the two branches work independently: the class-agnostic expression intensity branch generates expression proposals, while the class-aware apex-classification branch is responsible for macro- and micro-expression classification.Furthermore, we introduce an intensity-aware contrastive loss to enhance discriminative feature learning and suppress neutral noise by contrasting neutral frames with expression frames with various intensities. Extensive experiments on the SAMM-LV, CAS(ME)$^2$, and CAS(ME)$^3$ datasets demonstrate the effectiveness of our proposed framework.
Problem

Research questions and friction points this paper is trying to address.

Spots facial expressions in videos using only single timestamp annotations
Models expression intensity distribution with Gaussian-based adaptive labeling
Classifies macro- and micro-expressions through dual-branch framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gaussian-based instance-adaptive intensity modeling for soft labeling
Two-branch framework with intensity and classification branches
Intensity-aware contrastive loss for feature learning
🔎 Similar Papers
No similar papers found.
Y
Yicheng Deng
Graduate School of Information Science and Technology, The University of Osaka, Suita, 565-0871, Japan
H
Hideaki Hayashi
D3 Center, The University of Osaka, Suita, 565-0871, Japan
Hajime Nagahara
Hajime Nagahara
Professor of Osaka University
Computational PhotographyComputer Vision