Dual Cross-Attention Siamese Transformer for Rectal Tumor Regrowth Assessment in Watch-and-Wait Endoscopy

📅 2025-12-03

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

For rectal cancer patients achieving clinical complete response (cCR) after neoadjuvant therapy, the watch-and-wait (WW) strategy is increasingly adopted; however, early, objective, and accurate detection of local recurrence (LR) during endoscopic surveillance remains a critical unmet need. Method: We propose a registration-free dual-phase endoscopic image analysis framework: a Siamese network built upon a pretrained Swin Transformer, augmented with a novel dual cross-attention mechanism to enhance inter-phase feature interaction, and integrated with longitudinal contrastive learning. Contribution/Results: Evaluated on 62 patient cases, our model achieves 81.76% balanced accuracy, 90.07% sensitivity, and 72.86% specificity for LR detection. Feature clustering demonstrates strong discriminative capability, and the model exhibits robustness against common endoscopic artifacts. The approach delivers interpretable, highly robust AI-assisted decision support for precise dynamic monitoring in WW management.

Technology Category

Application Category

📝 Abstract

Increasing evidence supports watch-and-wait (WW) surveillance for patients with rectal cancer who show clinical complete response (cCR) at restaging following total neoadjuvant treatment (TNT). However, objectively accurate methods to early detect local regrowth (LR) from follow-up endoscopy images during WW are essential to manage care and prevent distant metastases. Hence, we developed a Siamese Swin Transformer with Dual Cross-Attention (SSDCA) to combine longitudinal endoscopic images at restaging and follow-up and distinguish cCR from LR. SSDCA leverages pretrained Swin transformers to extract domain agnostic features and enhance robustness to imaging variations. Dual cross attention is implemented to emphasize features from the two scans without requiring any spatial alignment of images to predict response. SSDCA as well as Swin-based baselines were trained using image pairs from 135 patients and evaluated on a held-out set of image pairs from 62 patients. SSDCA produced the best balanced accuracy (81.76% $pm$ 0.04), sensitivity (90.07% $pm$ 0.08), and specificity (72.86% $pm$ 0.05). Robustness analysis showed stable performance irrespective of artifacts including blood, stool, telangiectasia, and poor image quality. UMAP clustering of extracted features showed maximal inter-cluster separation (1.45 $pm$ 0.18) and minimal intra-cluster dispersion (1.07 $pm$ 0.19) with SSDCA, confirming discriminative representation learning.

Problem

Research questions and friction points this paper is trying to address.

Develops a Siamese Transformer to detect rectal tumor regrowth from endoscopic images

Aims to distinguish complete response from local regrowth using longitudinal image pairs

Enhances robustness to imaging variations without requiring spatial alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Siamese Swin Transformer with Dual Cross-Attention for image analysis

Dual cross-attention emphasizes features without spatial alignment

Pretrained Swin transformers extract domain-agnostic robust features

🔎 Similar Papers

Swin transformers are robust to distribution and concept drift in endoscopy-based longitudinal rectal cancer assessment