Beyond the Fold: Quantifying Split-Level Noise and the Case for Leave-One-Dataset-Out AU Evaluation

📅 2026-04-02

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

This study addresses the unreliability of conventional subject-exclusive cross-validation in facial action unit (AU) detection, which is susceptible to random partitioning noise that obscures genuine performance improvements. For the first time, the work systematically quantifies the stochastic variance introduced by this evaluation protocol and proposes Leave-One-Dataset-Out (LODO) cross-validation to eliminate partition randomness, thereby enabling more robust assessment of cross-dataset generalization. Experiments across five mainstream AU datasets reveal substantial evaluation instability for low-prevalence AUs—e.g., an average F1-score noise floor of ±0.065 on BP4D+—and demonstrate that LODO uncovers domain-level instabilities invisible to single-dataset cross-validation. These findings suggest that many reported performance gains may fall within the margin of evaluation variance rather than reflecting true model advances.

Technology Category

Application Category

📝 Abstract

Subject-exclusive cross-validation is the standard evaluation protocol for facial Action Unit (AU) detection, yet reported improvements are often small. We show that cross-validation itself introduces measurable stochastic variance. On BP4D+, repeated 3-fold subject-exclusive splits produce an empirical noise floor of $\pm 0.065$ in average F1, with substantially larger variation for low-prevalence AUs. Operating-point metrics such as F1 fluctuate more than threshold-independent measures such as AUC, and model ranking can change under different fold assignments. We further evaluate cross-dataset robustness using a Leave-One-Dataset-Out (LODO) protocol across five AU datasets. LODO removes partition randomness and exposes domain-level instability that is not visible under single-dataset cross-validation. Together, these results suggest that gains often reported in cross-fold validation may fall within protocol variance. Leave-one-dataset-out cross-validation yields more stable and interpretable findings

Problem

Research questions and friction points this paper is trying to address.

Action Unit detection

cross-validation variance

evaluation protocol

domain robustness

Leave-One-Dataset-Out

Innovation

Methods, ideas, or system contributions that make the work stand out.

subject-exclusive cross-validation

evaluation noise

Leave-One-Dataset-Out