Beyond the Fold: Quantifying Split-Level Noise and the Case for Leave-One-Dataset-Out AU Evaluation

πŸ“… 2026-04-02
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the unreliability of conventional subject-exclusive cross-validation in facial action unit (AU) detection, which is susceptible to random partitioning noise that obscures genuine performance improvements. For the first time, the work systematically quantifies the stochastic variance introduced by this evaluation protocol and proposes Leave-One-Dataset-Out (LODO) cross-validation to eliminate partition randomness, thereby enabling more robust assessment of cross-dataset generalization. Experiments across five mainstream AU datasets reveal substantial evaluation instability for low-prevalence AUsβ€”e.g., an average F1-score noise floor of Β±0.065 on BP4D+β€”and demonstrate that LODO uncovers domain-level instabilities invisible to single-dataset cross-validation. These findings suggest that many reported performance gains may fall within the margin of evaluation variance rather than reflecting true model advances.
πŸ“ Abstract
Subject-exclusive cross-validation is the standard evaluation protocol for facial Action Unit (AU) detection, yet reported improvements are often small. We show that cross-validation itself introduces measurable stochastic variance. On BP4D+, repeated 3-fold subject-exclusive splits produce an empirical noise floor of $\pm 0.065$ in average F1, with substantially larger variation for low-prevalence AUs. Operating-point metrics such as F1 fluctuate more than threshold-independent measures such as AUC, and model ranking can change under different fold assignments. We further evaluate cross-dataset robustness using a Leave-One-Dataset-Out (LODO) protocol across five AU datasets. LODO removes partition randomness and exposes domain-level instability that is not visible under single-dataset cross-validation. Together, these results suggest that gains often reported in cross-fold validation may fall within protocol variance. Leave-one-dataset-out cross-validation yields more stable and interpretable findings
Problem

Research questions and friction points this paper is trying to address.

Action Unit detection
cross-validation variance
evaluation protocol
domain robustness
Leave-One-Dataset-Out
Innovation

Methods, ideas, or system contributions that make the work stand out.

subject-exclusive cross-validation
evaluation noise
Leave-One-Dataset-Out
Action Unit detection
cross-dataset robustness
πŸ”Ž Similar Papers
No similar papers found.