๐ค AI Summary
This work addresses the challenge of geometric discontinuities in soft-tissue 3D reconstruction within surgical scenes, caused by low texture, specular reflections, and instrument occlusions, which existing fixed-topology methods struggle to model. To overcome this, the authors propose the EndoVGGT framework, featuring a Deformation-aware Graph Attention (DeGAT) module that replaces static neighborhood structures with dynamically constructed semantic graphs in feature space, effectively capturing long-range dependencies among tissue regions. This enables structure propagation under occlusion and recovery of non-rigid deformations. By integrating graph neural networkโbased depth estimation, dynamic graph construction, and geometric prior learning, the method achieves zero-shot cross-domain generalization. Evaluated on the SCARED dataset, it improves PSNR by 24.6% and SSIM by 9.1%, demonstrating strong generalization on both unseen SCARED and EndoNeRF datasets.
๐ Abstract
Accurate 3D reconstruction of deformable soft tissues is essential for surgical robotic perception. However, low-texture surfaces, specular highlights, and instrument occlusions often fragment geometric continuity, posing a challenge for existing fixed-topology approaches. To address this, we propose EndoVGGT, a geometry-centric framework equipped with a Deformation-aware Graph Attention (DeGAT) module. Rather than using static spatial neighborhoods, DeGAT dynamically constructs feature-space semantic graphs to capture long-range correlations among coherent tissue regions. This enables robust propagation of structural cues across occlusions, enforcing global consistency and improving non-rigid deformation recovery. Extensive experiments on SCARED show that our method significantly improves fidelity, increasing PSNR by 24.6% and SSIM by 9.1% over prior state-of-the-art. Crucially, EndoVGGT exhibits strong zero-shot cross-dataset generalization to the unseen SCARED and EndoNeRF domains, confirming that DeGAT learns domain-agnostic geometric priors. These results highlight the efficacy of dynamic feature-space modeling for consistent surgical 3D reconstruction.