🤖 AI Summary
This work addresses the severe degradation of endoscopic imaging caused by surgical smoke, which impairs visual perception and downstream tasks in minimally invasive and robot-assisted surgery. We propose a desmoking model that integrates physical priors with deep learning, employing a Transformer backbone coupled with a physics-inspired desmoking head to jointly predict smoke-free images and smoke distribution maps. To support large-scale supervised training, we construct the largest paired surgical smoke dataset to date—comprising 5,817 da Vinci image pairs—and develop a synthetic smoke generation pipeline. Extensive experiments demonstrate that our method achieves state-of-the-art performance on both public and in-house benchmarks, and significantly enhances the accuracy of downstream tasks such as stereo depth estimation and surgical instrument segmentation.
📝 Abstract
Minimally invasive and robot-assisted surgery relies heavily on endoscopic imaging, yet surgical smoke produced by electrocautery and vessel-sealing instruments can severely degrade visual perception and hinder vision-based functionalities. We present a transformer-based surgical desmoking model with a physics-inspired desmoking head that jointly predicts smoke-free image and corresponding smoke map. To address the scarcity of paired smoky-to-smoke-free training data, we develop a synthetic data generation pipeline that blends artificial smoke patterns with real endoscopic images, yielding over 80,000 paired samples for supervised training. We further curate, to our knowledge, the largest paired surgical smoke dataset to date, comprising 5,817 image pairs captured with the da Vinci robotic surgical system, enabling benchmarking on high-resolution endoscopic images. Extensive experiments on both a public benchmark and our dataset demonstrate state-of-the-art performance in image reconstruction compared to existing dehazing and desmoking approaches. We also assess the impact of desmoking on downstream stereo depth estimation and instrument segmentation, highlighting both the potential benefits and current limitations of digital smoke removal methods.