🤖 AI Summary
Existing methods for 3D scene evaluation primarily emphasize reconstruction fidelity and photorealism, often overlooking higher-level aesthetic attributes such as composition and harmony, and critically lacking aesthetically annotated 3D Gaussian Splatting datasets. To address this gap, this work proposes the first aesthetic assessment framework tailored for 3D Gaussian Splatting. We introduce Aesthetic3D, the first dataset with human-annotated aesthetic scores for 3D scenes, and present Aes3DGSNet, a lightweight end-to-end model that directly predicts scene-level aesthetic scores from raw 3D Gaussian primitives without requiring multi-view rendering. By circumventing the conventional reliance on image-based rendering pipelines, our approach establishes a new paradigm for efficient and effective aesthetic evaluation of 3D scenes, achieving superior performance while maintaining model compactness and setting a new benchmark in this emerging domain.
📝 Abstract
As 3D Gaussian Splatting (3DGS) gains attention in immersive media and digital content creation, assessing the aesthetics of 3D scenes becomes important in helping creators build more visually compelling 3D content. However, existing evaluation methods for 3D scenes primarily emphasize reconstruction fidelity and perceptual realism, largely overlooking higher-level aesthetic attributes such as composition, harmony, and visual appeal. This limitation comes from two key challenges: (1) the absence of general 3DGS datasets with aesthetic annotations, and (2) the intrinsic nature of 3DGS as a low-level primitive representation, which makes it difficult to capture high-level aesthetic features. To address these challenges, we propose Aes3D, the first systematic framework for assessing the aesthetics of 3D neural rendering scenes. Aes3D includes Aesthetic3D, the first dataset dedicated to 3D scene aesthetic assessment, built on our proposed annotation strategy for 3D scene aesthetics. In addition, we present Aes3DGSNet, a lightweight model that directly predicts scene-level aesthetic scores from 3DGS representations. Notably, our model operates solely on 3D Gaussian primitives, eliminating the need for rendering multi-view images and thus reducing computational cost and hardware requirements. Through aesthetics-supervised learning on multi-view 3DGS scene representations, Aes3DGSNet effectively captures high-level aesthetic cues and accurately regresses aesthetic scores. Experimental results demonstrate that our approach achieves strong performance while maintaining a lightweight design, establishing a new benchmark for 3D scene aesthetic assessment. Code and datasets will be made available in a future version.