🤖 AI Summary
Fine-grained surgical phase recognition in endoscopic submucosal dissection (ESD) remains challenging due to strong lesion heterogeneity and complex dynamic tissue interactions, while existing video temporal models suffer from limitations in modeling long-range dependencies and ensuring real-time inference.
Method: We propose TranMamba—a novel framework featuring the State-space Residual Temporal Module (SRTM), which uniquely integrates the Mamba state-space model with a scalable residual mechanism, coupled with a hierarchical sampling strategy to jointly achieve robust long-range temporal modeling and efficient real-time inference. The method employs end-to-end video temporal learning without requiring manual keyframe annotations.
Contribution/Results: On the ESD385 dataset, TranMamba achieves 87.64% accuracy (+1.0% over prior SOTA); it further demonstrates strong cross-domain generalization, maintaining state-of-the-art performance on Cholec80. Its computational efficiency and accuracy make it highly suitable for clinical real-time deployment.
📝 Abstract
Endoscopic Submucosal Dissection (ESD) is a minimally invasive procedure initially developed for early gastric cancer treatment and has expanded to address diverse gastrointestinal lesions. While computer-assisted surgery (CAS) systems enhance ESD precision and safety, their efficacy hinges on accurate real-time surgical phase recognition, a task complicated by ESD's inherent complexity, including heterogeneous lesion characteristics and dynamic tissue interactions. Existing video-based phase recognition algorithms, constrained by inefficient temporal context modeling, exhibit limited performance in capturing fine-grained phase transitions and long-range dependencies. To overcome these limitations, we propose SPRMamba, a novel framework integrating a Mamba-based architecture with a Scaled Residual TranMamba (SRTM) block to synergize long-term temporal modeling and localized detail extraction. SPRMamba further introduces the Hierarchical Sampling Strategy to optimize computational efficiency, enabling real-time processing critical for clinical deployment. Evaluated on the ESD385 dataset and the cholecystectomy benchmark Cholec80, SPRMamba achieves state-of-the-art performance (87.64% accuracy on ESD385, +1.0% over prior methods), demonstrating robust generalizability across surgical workflows. This advancement bridges the gap between computational efficiency and temporal sensitivity, offering a transformative tool for intraoperative guidance and skill assessment in ESD surgery. The code is accessible at https://github.com/Zxnyyyyy/SPRMamba.