🤖 AI Summary
In multi-view 3D scene modeling, achieving both high-fidelity rendering and accurate geometric reconstruction remains challenging due to inherent trade-offs and excessive computational/memory overhead. To address this, we propose CarGS—a unified framework built upon 3D Gaussian Splatting (3DGS). CarGS introduces an adaptive regularization mechanism that encodes geometric priors into a lightweight MLP, and a geometry-aware densification strategy jointly guided by surface normals and implicit signed distance functions (SDFs), effectively mitigating the rendering-reconstruction conflict. Experiments demonstrate that CarGS achieves state-of-the-art performance in both rendering quality (PSNR/SSIM) and geometric reconstruction accuracy (Chamfer distance/F-Score), while maintaining real-time inference speed and ultra-low GPU memory consumption (<2 GB)—significantly outperforming existing decoupled or jointly optimized approaches.
📝 Abstract
Representing 3D scenes from multiview images is a core challenge in computer vision and graphics, which requires both precise rendering and accurate reconstruction. Recently, 3D Gaussian Splatting (3DGS) has garnered significant attention for its high-quality rendering and fast inference speed. Yet, due to the unstructured and irregular nature of Gaussian point clouds, ensuring accurate geometry reconstruction remains difficult. Existing methods primarily focus on geometry regularization, with common approaches including primitive-based and dual-model frameworks. However, the former suffers from inherent conflicts between rendering and reconstruction, while the latter is computationally and storage-intensive. To address these challenges, we propose CarGS, a unified model leveraging Contribution-adaptive regularization to achieve simultaneous, high-quality rendering and surface reconstruction. The essence of our framework is learning adaptive contribution for Gaussian primitives by squeezing the knowledge from geometry regularization into a compact MLP. Additionally, we introduce a geometry-guided densification strategy with clues from both normals and Signed Distance Fields (SDF) to improve the capability of capturing high-frequency details. Our design improves the mutual learning of the two tasks, meanwhile its unified structure does not require separate models as in dual-model based approaches, guaranteeing efficiency. Extensive experiments demonstrate the ability to achieve state-of-the-art (SOTA) results in both rendering fidelity and reconstruction accuracy while maintaining real-time speed and minimal storage size.