🤖 AI Summary
This work addresses the performance degradation of dense prediction models in surgical visual scenes due to distribution shifts between training and deployment data. To tackle this challenge, the authors propose an unsupervised representation adaptation framework based on texture-aware attention, which introduces, for the first time, a texture-perceiving slot attention mechanism into dense prediction tasks in surgical settings. By learning disentangled texture representations, the method enables effective cross-domain adaptation without requiring annotations in the target domain. Furthermore, a model fusion strategy is integrated to enhance performance. The proposed approach consistently outperforms existing segmentation models and test-time adaptation methods across diverse surgical scenarios, significantly improving the model’s generalization capability under distributional shifts.
📝 Abstract
Dense prediction tasks in surgical computer vision, such as segmentation and surgical zone prediction, can provide valuable guidance for laparoscopic and robotic surgery. However, these models often suffer from distribution shifts, as training datasets rarely cover the variability encountered during deployment, leading to poor generalization. We propose DenseTRF, a self-supervised representation adaptation framework based on texture-centric attention. Our method leverages slot attention to learn texture-aware representations that capture invariant visual structures. By adapting these representations to the target distribution without supervision, DenseTRF significantly improves robustness to domain shifts. The framework is implemented through conditioning dense prediction on slot attention and model merging strategies. Experiments across multiple surgical procedures demonstrate improved cross-distribution generalization in comparison to state-of-the-art segmentation models and test-distribution adaptation methods for dense prediction tasks.