๐ค AI Summary
This work addresses the challenge of reconstructing complete, high-fidelity 3D geometry of everyday objects from low-quality, incompletely covered, and poorly calibrated consumer-grade multi-view videosโwithout requiring professional 3D scanners. The method leverages a single user-provided 2D point correspondence as weak supervision, integrating Structure-from-Motion, multi-view stereo matching, and neural rendering to achieve robust cross-video multi-view alignment and dense reconstruction. Crucially, it introduces multi-video collaborative modeling to mitigate occlusion-induced holes inherent in single-video reconstructions, and supports spatial registration via AR markers or checkerboard patterns. Experiments demonstrate stable reconstruction of watertight, high-fidelity meshes from unstructured smartphone videos. The approach exhibits strong generalization across diverse objects and scenes, and the code is publicly released, underscoring its practical utility for real-world 3D content creation.
๐ Abstract
How can we extract complete geometric models of objects that we encounter in our daily life, without having access to commercial 3D scanners? In this paper we present an automated system for generating geometric models of objects from two or more videos. Our system requires the specification of one known point in at least one frame of each video, which can be automatically determined using a fiducial marker such as a checkerboard or Augmented Reality (AR) marker. The remaining frames are automatically positioned in world space by using Structure-from-Motion techniques. By using multiple videos and merging results, a complete object mesh can be generated, without having to rely on hole filling. Code for our system is available from https://github.com/FlorisE/NeuralMeshing.