Dual Cross-Attention Siamese Transformer for Rectal Tumor Regrowth Assessment in Watch-and-Wait Endoscopy

📅 2025-12-03
📈 Citations: 0
Influential: 0
📄 PDF

career value

165K/year
🤖 AI Summary
For rectal cancer patients achieving clinical complete response (cCR) after neoadjuvant therapy, the watch-and-wait (WW) strategy is increasingly adopted; however, early, objective, and accurate detection of local recurrence (LR) during endoscopic surveillance remains a critical unmet need. Method: We propose a registration-free dual-phase endoscopic image analysis framework: a Siamese network built upon a pretrained Swin Transformer, augmented with a novel dual cross-attention mechanism to enhance inter-phase feature interaction, and integrated with longitudinal contrastive learning. Contribution/Results: Evaluated on 62 patient cases, our model achieves 81.76% balanced accuracy, 90.07% sensitivity, and 72.86% specificity for LR detection. Feature clustering demonstrates strong discriminative capability, and the model exhibits robustness against common endoscopic artifacts. The approach delivers interpretable, highly robust AI-assisted decision support for precise dynamic monitoring in WW management.

Technology Category

Application Category

📝 Abstract
Increasing evidence supports watch-and-wait (WW) surveillance for patients with rectal cancer who show clinical complete response (cCR) at restaging following total neoadjuvant treatment (TNT). However, objectively accurate methods to early detect local regrowth (LR) from follow-up endoscopy images during WW are essential to manage care and prevent distant metastases. Hence, we developed a Siamese Swin Transformer with Dual Cross-Attention (SSDCA) to combine longitudinal endoscopic images at restaging and follow-up and distinguish cCR from LR. SSDCA leverages pretrained Swin transformers to extract domain agnostic features and enhance robustness to imaging variations. Dual cross attention is implemented to emphasize features from the two scans without requiring any spatial alignment of images to predict response. SSDCA as well as Swin-based baselines were trained using image pairs from 135 patients and evaluated on a held-out set of image pairs from 62 patients. SSDCA produced the best balanced accuracy (81.76% $pm$ 0.04), sensitivity (90.07% $pm$ 0.08), and specificity (72.86% $pm$ 0.05). Robustness analysis showed stable performance irrespective of artifacts including blood, stool, telangiectasia, and poor image quality. UMAP clustering of extracted features showed maximal inter-cluster separation (1.45 $pm$ 0.18) and minimal intra-cluster dispersion (1.07 $pm$ 0.19) with SSDCA, confirming discriminative representation learning.
Problem

Research questions and friction points this paper is trying to address.

Develops a Siamese Transformer to detect rectal tumor regrowth from endoscopic images
Aims to distinguish complete response from local regrowth using longitudinal image pairs
Enhances robustness to imaging variations without requiring spatial alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Siamese Swin Transformer with Dual Cross-Attention for image analysis
Dual cross-attention emphasizes features without spatial alignment
Pretrained Swin transformers extract domain-agnostic robust features
J
Jorge Tapias Gomez
Department of Medical Physics, Memorial Sloan Kettering Cancer Center, USA
D
Despoina Kanata
Department of Surgery, Colorectal Service, Memorial Sloan Kettering Cancer Center, USA
A
A. Rangnekar
Department of Medical Physics, Memorial Sloan Kettering Cancer Center, USA
Christina Lee
Christina Lee
National University of Singapore
J
J. Garcia-Aguilar
Department of Surgery, Colorectal Service, Memorial Sloan Kettering Cancer Center, USA
J
Joshua Jesse Smith
Department of Surgery, Colorectal Service, Memorial Sloan Kettering Cancer Center, USA
H
H. Veeraraghavan
Department of Medical Physics, Memorial Sloan Kettering Cancer Center, USA