RaySt3R: Predicting Novel Depth Maps for Zero-Shot Object Completion

📅 2025-06-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 3D shape completion methods suffer from geometric inconsistency, blurry boundaries, and high computational overhead. This paper proposes a zero-shot 3D object completion paradigm that reformulates completion as novel-view depth map synthesis, taking a single RGB-D image and arbitrary query rays as input and predicting depth, binary masks, and pixel-wise confidence scores in an end-to-end manner. Our key innovations include the first adoption of ray-level modeling with a feed-forward Transformer architecture, coupled with multi-view deep fusion and confidence-weighted reconstruction—enabling cross-category generalization without category-specific priors or fine-tuning. Evaluated on both synthetic and real-world datasets, our method achieves state-of-the-art performance, reducing 3D Chamfer distance by up to 44% over the best baseline. It simultaneously ensures geometric consistency, sharp boundary reconstruction, and efficient inference.

Technology Category

Application Category

📝 Abstract
3D shape completion has broad applications in robotics, digital twin reconstruction, and extended reality (XR). Although recent advances in 3D object and scene completion have achieved impressive results, existing methods lack 3D consistency, are computationally expensive, and struggle to capture sharp object boundaries. Our work (RaySt3R) addresses these limitations by recasting 3D shape completion as a novel view synthesis problem. Specifically, given a single RGB-D image and a novel viewpoint (encoded as a collection of query rays), we train a feedforward transformer to predict depth maps, object masks, and per-pixel confidence scores for those query rays. RaySt3R fuses these predictions across multiple query views to reconstruct complete 3D shapes. We evaluate RaySt3R on synthetic and real-world datasets, and observe it achieves state-of-the-art performance, outperforming the baselines on all datasets by up to 44% in 3D chamfer distance. Project page: https://rayst3r.github.io
Problem

Research questions and friction points this paper is trying to address.

Predicting depth maps for 3D object completion
Ensuring 3D consistency and sharp boundaries
Achieving efficient novel view synthesis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Recasts 3D completion as view synthesis
Uses transformer for depth and mask prediction
Fuses multi-view predictions for 3D shapes
🔎 Similar Papers
No similar papers found.