🤖 AI Summary
This study addresses the lack of objective tools for early identification of local recurrence in rectal cancer under current “watch-and-wait” strategies. The authors propose TREX, a novel model that, for the first time, leverages longitudinal pairs of endoscopic images—acquired at post-treatment assessment and during follow-up—for predicting tumor regrowth. TREX employs a Siamese network architecture based on a pre-trained Swin Transformer and introduces a dual cross-attention mechanism to enable temporal feature fusion without requiring spatial registration. Evaluated clinically, the method achieves high-sensitivity early detection 3–12 months before formal diagnosis, with accuracies of 74% (3–6 months) and 62% (6–12 months), an overall sensitivity of 97%, and a balanced accuracy of 90%, matching the performance of expert surgeons.
📝 Abstract
Clinical trial studies indicate benefit of watch-and-wait (WW) surveillance for patients with rectal cancer showing a complete or near clinical response (CR) directly after treatment (restaging). However, there are no objectively accurate methods to early detect local tumor regrowth (LR) in patients undergoing WW from follow-up exams. Hence, we developed Temporal Rectal Endoscopy Cross-attention (TREX), a longitudinal deep learning approach that combines pairs of images acquired at restaging and follow-up to distinguish CR from LR. TREX uses pretrained Swin Transformers in a siamese setting to extract features from longitudinal images and dual cross-attention to combine the features without spatial co-registration between image pairs. TREX and Swin-based baselines were trained under two settings: (a) detecting LR or CR at the last available follow-up and (b) early detection of LR at 3--6, 6--12, and 12--24 months before clinical confirmation. TREX achieved the highest accuracy in detecting LR with a high sensitivity of 97% $\pm$ 6% and a balanced accuracy of 90% $\pm$ 3%, and outperformed all baselines in early detection at both 3--6 (74% $\pm$ 1%) and 6--12 months (62% $\pm$ 4%) prior to clinical detection. Clinical validation via a surgeon survey showed that TREX matched attending-level overall accuracy (TREX: 86.21% vs.\ Clinicians: 87.84% $\pm$ 1.28%). Finally, we explored TREX's ability to predict treatment response by combining pre-treatment (pre-TNT) and restaging endoscopies, achieving a balanced accuracy of 73% $\pm$ 12%. These results show that longitudinal deep learning analysis of endoscopy may improve surveillance and enable earlier identification of rectal cancer regrowth.