🤖 AI Summary
This work addresses significant depth errors in the original SCARED dataset, where non-keyframe camera poses were estimated using robot kinematics, yielding only 35 reliable keyframes. To overcome this limitation, the authors employ COLMAP for structure-from-motion (SfM) to re-estimate camera poses for all frames and align the resulting reconstruction to the ground-truth depth of the original keyframes for metric scale recovery. This process produces a high-fidelity endoscopic RGB-D dataset, expanding the number of reliable RGB-D samples from 35 to 17,135 and substantially enhancing data usability. The refined dataset demonstrates superior performance in both stereo matching and monocular depth estimation tasks and is publicly released alongside the correction pipeline and code.
📝 Abstract
The SCARED dataset is a widely used benchmark for endoscopic depth estimation, offering ground-truth 3D reconstructions captured with a structured light sensor. However, the depth maps for non-keyframe images rely on robot kinematics that introduce substantial pose errors, limiting the reliably labeled portion of the dataset to 35 keyframes. We present SCARED-C, a corrected version of the SCARED dataset that expands the number of reliable RGB-D pairs from 35 to 17,135. Our pipeline applies COLMAP, a Structure-from-Motion system, to re-estimate camera poses for all frames, followed by a scale recovery step that aligns the resulting reconstructions to metric space using the ground-truth keyframe depth maps. We validate the corrected poses through (1) stereo disparity evaluation and (2) monocular depth estimation experiments. The corrected dataset and code are publicly released to the community.