🤖 AI Summary
This study addresses critical safety concerns in medical applications arising from AI-generated CT images, where existing detection methods suffer from insufficient sensitivity to forgery artifacts and a lack of standardized benchmarks for evaluating generalization. To bridge this gap, the authors introduce CTForensics—the first comprehensive benchmark dataset encompassing CT images synthesized by ten diverse generative models—and propose ESF-CTFD, an enhanced spatial-frequency domain detector. Built upon a CNN architecture, ESF-CTFD integrates wavelet transforms, multi-scale spatial features, and frequency-domain analysis to enable end-to-end forgery detection through joint spatial-frequency modeling. Extensive experiments demonstrate that ESF-CTFD consistently outperforms state-of-the-art methods across multiple generative models, achieving superior detection accuracy and strong cross-model generalization capability.
📝 Abstract
With the rapid development of generative AI in medical imaging, synthetic Computed Tomography (CT) images have demonstrated great potential in applications such as data augmentation and clinical diagnosis, but they also introduce serious security risks. Despite the increasing security concerns, existing studies on CT forgery detection are still limited and fail to adequately address real-world challenges. These limitations are mainly reflected in two aspects: the absence of datasets that can effectively evaluate model generalization to reflect the real-world application requirements, and the reliance on detection methods designed for natural images that are insensitive to CT-specific forgery artifacts. In this view, we propose CTForensics, a comprehensive dataset designed to systematically evaluate the generalization capability of CT forgery detection methods, which includes ten diverse CT generative methods. Moreover, we introduce the Enhanced Spatial-Frequency CT Forgery Detector (ESF-CTFD), an efficient CNN-based neural network that captures forgery cues across the wavelet, spatial, and frequency domains. First, it transforms the input CT image into three scales and extracts features at each scale via the Wavelet-Enhanced Central Stem. Then, starting from the largest-scale features, the Spatial Process Block gradually performs feature fusion with the smaller-scale ones. Finally, the Frequency Process Block learns frequency-domain information for predicting the final results. Experiments demonstrate that ESF-CTFD consistently outperforms existing methods and exhibits superior generalization across different CT generative models.