🤖 AI Summary
To address the challenge of simultaneously achieving high-fidelity focused visualization of specific objects and computational efficiency in large-scale cultural heritage scenes, this paper proposes a two-level decoupled neural radiance field (NeRF) framework: a global Scene NeRF jointly modeled with multiple object-level Region-of-Interest (ROI) NeRFs. We introduce an object-aware camera grouping mechanism and ray-level compositing rendering to enable collaborative training and real-time, seamless integration of ROI NeRFs with the Scene NeRF. This design facilitates fine-grained, low-redundancy, and scalable object-level level-of-detail (LOD) control. Evaluated on real-world cultural heritage datasets, our method achieves significant improvements in PSNR and SSIM within target regions, markedly reduces visual artifacts, and attains inference speed comparable to a single NeRF—without introducing additional computational overhead.
📝 Abstract
Efficient and accurate 3D reconstruction is essential for applications in cultural heritage. This study addresses the challenge of visualizing objects within large-scale scenes at a high level of detail (LOD) using Neural Radiance Fields (NeRFs). The aim is to improve the visual fidelity of chosen objects while maintaining the efficiency of the computations by focusing on details only for relevant content. The proposed ROI-NeRFs framework divides the scene into a Scene NeRF, which represents the overall scene at moderate detail, and multiple ROI NeRFs that focus on user-defined objects of interest. An object-focused camera selection module automatically groups relevant cameras for each NeRF training during the decomposition phase. In the composition phase, a Ray-level Compositional Rendering technique combines information from the Scene NeRF and ROI NeRFs, allowing simultaneous multi-object rendering composition. Quantitative and qualitative experiments conducted on two real-world datasets, including one on a complex eighteen's century cultural heritage room, demonstrate superior performance compared to baseline methods, improving LOD for object regions, minimizing artifacts, and without significantly increasing inference time.