🤖 AI Summary
Automatic pain assessment for non-communicative patients faces dual challenges: scarcity of real-world clinical data, severe demographic and label imbalance, and lack of clinically interpretable control in generative models. Method: We propose 3DPain—the first controllable, multimodal synthetic dataset for pain assessment—comprising 82,500 samples spanning 2,500 identities and 25,000 pain heatmaps, with precise ternary alignment of anatomical pain heatmaps, facial action unit (AU) configurations, and PSPI clinical scores. Our approach introduces a three-stage controllable generation framework (3D mesh modeling → diffusion-based texture mapping → AU-driven expression binding) and ViTPain, a cross-modal distillation model integrating Vision Transformer with heatmap-guided teacher–student distillation. Contribution/Results: 3DPain significantly surpasses existing benchmarks in diversity, annotation richness, and clinical alignment, enabling high-explainability, cross-population, and multi-view pain recognition research.
📝 Abstract
Automated pain assessment from facial expressions is crucial for non-communicative patients, such as those with dementia. Progress has been limited by two challenges: (i) existing datasets exhibit severe demographic and label imbalance due to ethical constraints, and (ii) current generative models cannot precisely control facial action units (AUs), facial structure, or clinically validated pain levels.
We present 3DPain, a large-scale synthetic dataset specifically designed for automated pain assessment, featuring unprecedented annotation richness and demographic diversity. Our three-stage framework generates diverse 3D meshes, textures them with diffusion models, and applies AU-driven face rigging to synthesize multi-view faces with paired neutral and pain images, AU configurations, PSPI scores, and the first dataset-level annotations of pain-region heatmaps. The dataset comprises 82,500 samples across 25,000 pain expression heatmaps and 2,500 synthetic identities balanced by age, gender, and ethnicity.
We further introduce ViTPain, a Vision Transformer based cross-modal distillation framework in which a heatmap-trained teacher guides a student trained on RGB images, enhancing accuracy, interpretability, and clinical reliability. Together, 3DPain and ViTPain establish a controllable, diverse, and clinically grounded foundation for generalizable automated pain assessment.