🤖 AI Summary
This work addresses the limitations of conventional upsampling methods—such as transposed convolution and interpolation—which rely on fixed grids and often fail to recover fine structures in medical images, leading to artifacts. To overcome this, the authors propose the Deformable Transposed Convolution (DTC) module, which introduces deformable mechanisms into the upsampling process for the first time. By learning dynamic sampling coordinates, DTC enables flexible and precise feature reconstruction. The module is compatible with both 2D and 3D medical imaging and can be seamlessly integrated into UNet-style decoders. Extensive experiments on benchmark datasets including BTCV15, ISIC18, and BUSI demonstrate that DTC significantly improves segmentation accuracy and fine-detail recovery, enhancing the model’s adaptability to complex anatomical structures.
📝 Abstract
In medical image segmentation, particularly in UNet-like architectures, upsampling is primarily used to transform smaller feature maps into larger ones, enabling feature fusion between encoder and decoder features and supporting multi-scale prediction. Conventional upsampling methods, such as transposed convolution and linear interpolation, operate on fixed positions: transposed convolution applies kernel elements to predetermined pixel or voxel locations, while linear interpolation assigns values based on fixed coordinates in the original feature map. These fixed-position approaches may fail to capture structural information beyond predefined sampling positions and can lead to artifacts or loss of detail. Inspired by deformable convolutions, we propose a novel upsampling method, Deformable Transposed Convolution (DTC), which learns dynamic coordinates (i.e., sampling positions) to generate high-resolution feature maps for both 2D and 3D medical image segmentation tasks. Experiments on 3D (e.g., BTCV15) and 2D datasets (e.g., ISIC18, BUSI) demonstrate that DTC can be effectively integrated into existing medical image segmentation models, consistently improving the decoder's feature reconstruction and detail recovery capability.