π€ AI Summary
Existing NeRF and 3D Gaussian Splatting (3DGS) methods for non-cooperative spacecraft 3D reconstruction rely on accurate pose priors and incur high computational costs, rendering them impractical for resource-constrained space environments. Method: We propose a CNN-3DGS collaborative framework featuring a lightweight CNN-based primitive initialization network that directly generates differentiable 3D Gaussian point clouds under noisy or implicit pose conditions. It integrates multi-hypothesis pose estimation, implicit pose optimization, and primitive assembly to enable end-to-end joint poseβgeometry optimization. Contribution/Results: Experiments demonstrate high-fidelity reconstruction even under incomplete pose supervision, with ~90% reduction in training iterations and successful modeling from only 2β5 monocular images. The method significantly improves efficiency and robustness for on-orbit 3D reconstruction applications.
π Abstract
The advent of novel view synthesis techniques such as NeRF and 3D Gaussian Splatting (3DGS) has enabled learning precise 3D models only from posed monocular images. Although these methods are attractive, they hold two major limitations that prevent their use in space applications: they require poses during training, and have high computational cost at training and inference. To address these limitations, this work contributes: (1) a Convolutional Neural Network (CNN) based primitive initializer for 3DGS using monocular images; (2) a pipeline capable of training with noisy or implicit pose estimates; and (3) and analysis of initialization variants that reduce the training cost of precise 3D models. A CNN takes a single image as input and outputs a coarse 3D model represented as an assembly of primitives, along with the target's pose relative to the camera. This assembly of primitives is then used to initialize 3DGS, significantly reducing the number of training iterations and input images needed -- by at least an order of magnitude. For additional flexibility, the CNN component has multiple variants with different pose estimation techniques. This work performs a comparison between these variants, evaluating their effectiveness for downstream 3DGS training under noisy or implicit pose estimates. The results demonstrate that even with imperfect pose supervision, the pipeline is able to learn high-fidelity 3D representations, opening the door for the use of novel view synthesis in space applications.