🤖 AI Summary
This study addresses the challenges in prostate cancer classification from transrectal ultrasound (TRUS) videos, including information redundancy, high intra- and inter-class similarity, and low signal-to-noise ratio, which hinder feature discriminability and diagnostic accuracy. To overcome these limitations, the authors propose HFS-TriNet, a novel architecture that integrates three parallel branches—leveraging the medical Segment Anything Model (SAM), wavelet-transform convolutional residual (WTCR) blocks, and ResNet50—augmented with a heuristic frame selection (HFS) mechanism and a normalized attention module. This design enables efficient extraction of multi-scale features that jointly capture edge details, semantic consistency, and spatiotemporal dynamics. The proposed method substantially mitigates redundancy and noise interference while maintaining computational efficiency, leading to significantly improved accuracy and robustness in prostate cancer classification.
📝 Abstract
Transrectal ultrasound (TRUS) imaging is a cost-effective and non-invasive modality widely used in the diagnosis of prostate cancer. The computer-aided diagnosis (CAD) relying on TRUS images has been extensively investigated recently. Compared to static images, TRUS video provides richer spatial-temporal information, which make it a promising alternative for improving the accuracy and robustness of CAD systems. However, TRUS video analysis also introduces new challenges. These include information redundancy, which increases computational costs; high intra- and inter-class similarity, which complicates feature extraction; and a low signal-to-noise ratio, which hinders the identification of clinically relevant information. To address these problems, we propose a heuristic frame selection (HFS) and a three-branch collaborative feature learning network (HFS-TriNet) for prostate cancer classification from TRUS videos. Specifically, selecting a clip of video frames at intervals for training can mitigate redundancy. The HFS strategy dynamically initializes the starting point of each training clip, which ensures that the sampled clips span the entire video sequence. For better feature extraction, besides a regular ResNet50 branch, we also utilize 1) a large model branch based a pre-trained medical segment anything model (SAM) to extract deep features of each frame and a normalization-based attention module to explore the temporal consistency; and 2) a wavelet transform convolutional residual (WTCR) branch that extracts lesion edge information in the high-frequency domain and performs denoising in the low-frequency domain.