🤖 AI Summary
In operating rooms, highly standardized attire among medical staff causes deep learning models to rely on spurious visual cues—such as shoes or eyewear—leading to biased identity and behavior recognition, thereby hindering accurate modeling of individualized surgical competencies and team collaboration patterns. Method: We propose a geometric representation method based on 3D point cloud sequences that explicitly disentangles appearance-related confounders; this is the first work to introduce explicit geometric modeling for bias mitigation in surgical settings. Contribution/Results: Through dual-modality contrastive experiments (RGB vs. geometry), gradient-based saliency analysis, and CNN-based architecture evaluation, we demonstrate a 12% accuracy improvement over RGB-only models in real-world clinical scenarios. Our approach significantly enhances model generalizability and robustness against appearance-induced biases in low-visual-diversity environments.
📝 Abstract
Deep neural networks are prone to learning spurious correlations, exploiting dataset-specific artifacts rather than meaningful features for prediction. In surgical operating rooms (OR), these manifest through the standardization of smocks and gowns that obscure robust identifying landmarks, introducing model bias for tasks related to modeling OR personnel. Through gradient-based saliency analysis on two public OR datasets, we reveal that CNN models succumb to such shortcuts, fixating on incidental visual cues such as footwear beneath surgical gowns, distinctive eyewear, or other role-specific identifiers. Avoiding such biases is essential for the next generation of intelligent assistance systems in the OR, which should accurately recognize personalized workflow traits, such as surgical skill level or coordination with other staff members. We address this problem by encoding personnel as 3D point cloud sequences, disentangling identity-relevant shape and motion patterns from appearance-based confounders. Our experiments demonstrate that while RGB and geometric methods achieve comparable performance on datasets with apparent simulation artifacts, RGB models suffer a 12% accuracy drop in realistic clinical settings with decreased visual diversity due to standardizations. This performance gap confirms that geometric representations capture more meaningful biometric features, providing an avenue to developing robust methods of modeling humans in the OR.