Spatio-Temporal Representation Decoupling and Enhancement for Federated Instrument Segmentation in Surgical Videos

📅 2025-06-30

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

In federated learning (FL) for surgical instrument segmentation, severe data heterogeneity across multi-center clinical sites—arising from anatomical background variations and high visual similarity among instruments—hampers model generalization and privacy-preserving collaboration. Method: We propose a spatiotemporal representation disentanglement enhancement framework. It decouples query embeddings to enable private background feature learning while sharing instrument identity and temporal dynamics; leverages synthetic data from a surgical simulator; introduces explicit representation quantization (SERQ) to boost generalization; and integrates representation separation and collaboration (RSC), text-guided channel selection, and cross-site spatiotemporal feature alignment. Contribution/Results: Evaluated on both real and synthetic surgical videos, our method significantly improves segmentation accuracy and robustness under FL constraints, while simultaneously ensuring data privacy and enabling effective cross-center collaborative modeling.

Technology Category

Application Category

📝 Abstract

Surgical instrument segmentation under Federated Learning (FL) is a promising direction, which enables multiple surgical sites to collaboratively train the model without centralizing datasets. However, there exist very limited FL works in surgical data science, and FL methods for other modalities do not consider inherent characteristics in surgical domain: i) different scenarios show diverse anatomical backgrounds while highly similar instrument representation; ii) there exist surgical simulators which promote large-scale synthetic data generation with minimal efforts. In this paper, we propose a novel Personalized FL scheme, Spatio-Temporal Representation Decoupling and Enhancement (FedST), which wisely leverages surgical domain knowledge during both local-site and global-server training to boost segmentation. Concretely, our model embraces a Representation Separation and Cooperation (RSC) mechanism in local-site training, which decouples the query embedding layer to be trained privately, to encode respective backgrounds. Meanwhile, other parameters are optimized globally to capture the consistent representations of instruments, including the temporal layer to capture similar motion patterns. A textual-guided channel selection is further designed to highlight site-specific features, facilitating model adapta tion to each site. Moreover, in global-server training, we propose Synthesis-based Explicit Representation Quantification (SERQ), which defines an explicit representation target based on synthetic data to synchronize the model convergence during fusion for improving model generalization.

Problem

Research questions and friction points this paper is trying to address.

Decoupling spatio-temporal representations for surgical instrument segmentation

Enhancing federated learning with surgical domain knowledge

Improving model generalization using synthetic data quantification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decouples query embedding for private training

Uses textual-guided channel selection

Quantifies synthetic data representation explicitly

🔎 Similar Papers

No similar papers found.