🤖 AI Summary
To address the challenge of balancing reconstruction quality and real-time performance for mobile autonomous aerial robots in dynamic scenes, this paper proposes a point-of-interest (PoI)-oriented coarse-to-fine Gaussian lattice reconstruction framework. Methodologically, it introduces a novel synergistic mechanism of semantic Gaussian point editing and color-feature filtering, enabling early-stage separation of regions of interest during training; further integrated with semantic-enhanced Gaussian representation, incremental learning, and coarse-grained semantic initialization, it supports object-level fine-grained reconstruction. The key contribution is the first realization of stage-wise semantic-aware Gaussian acceleration, significantly reducing full-scene optimization overhead. Evaluated on the SCRREAM and NeRDS 360 datasets, our method achieves training time reduced to under 72% of state-of-the-art approaches, PSNR improvement of 1.8 dB in novel-view synthesis, and a 4.3% gain in PoI reconstruction IoU.
📝 Abstract
Mobile reconstruction for autonomous aerial robotics holds strong potential for critical applications such as tele-guidance and disaster response. These tasks demand both accurate 3D reconstruction and fast scene processing. Instead of reconstructing the entire scene in detail, it is often more efficient to focus on specific objects, i.e., points of interest (PoIs). Mobile robots equipped with advanced sensing can usually detect these early during data acquisition or preliminary analysis, reducing the need for full-scene optimization. Gaussian Splatting (GS) has recently shown promise in delivering high-quality novel view synthesis and 3D representation by an incremental learning process. Extending GS with scene editing, semantics adds useful per-splat features to isolate objects effectively.
Semantic 3D Gaussian editing can already be achieved before the full training cycle is completed, reducing the overall training time. Moreover, the semantically relevant area, the PoI, is usually already known during capturing. To balance high-quality reconstruction with reduced training time, we propose CoRe-GS. We first generate a coarse segmentation-ready scene with semantic GS and then refine it for the semantic object using our novel color-based effective filtering for effective object isolation. This is speeding up the training process to be about a quarter less than a full training cycle for semantic GS. We evaluate our approach on two datasets, SCRREAM (real-world, outdoor) and NeRDS 360 (synthetic, indoor), showing reduced runtime and higher novel-view-synthesis quality.