🤖 AI Summary
To address the low reconstruction fidelity and severe geometric detail loss in 3D variational autoencoders (VAEs), this paper proposes a hybrid implicit representation integrating tri-planes and octrees. The method embeds an octree structure into the VAE encoder—marking the first such integration—to enable non-uniform surface-aware encoding and explicit 3D topological modeling. It further introduces a hybrid latent space combining tri-planes with a low-resolution voxel grid, balancing global shape coherence and local geometric detail. Geometric-aware sampling and multi-scale tri-plane representations are incorporated to enhance surface fidelity. Evaluated on ShapeNet and other benchmarks, the approach achieves significant improvements: +2.1 dB in PSNR and −38% reduction in Chamfer distance. It enables high-fidelity reconstruction of complex topologies and fine-grained geometry, establishing a robust latent foundation for high-quality 3D diffusion-based generation.
📝 Abstract
Recent 3D content generation pipelines often leverage Variational Autoencoders (VAEs) to encode shapes into compact latent representations, facilitating diffusion-based generation. Efficiently compressing 3D shapes while preserving intricate geometric details remains a key challenge. Existing 3D shape VAEs often employ uniform point sampling and 1D/2D latent representations, such as vector sets or triplanes, leading to significant geometric detail loss due to inadequate surface coverage and the absence of explicit 3D representations in the latent space. Although recent work explores 3D latent representations, their large scale hinders high-resolution encoding and efficient training. Given these challenges, we introduce Hyper3D, which enhances VAE reconstruction through efficient 3D representation that integrates hybrid triplane and octree features. First, we adopt an octree-based feature representation to embed mesh information into the network, mitigating the limitations of uniform point sampling in capturing geometric distributions along the mesh surface. Furthermore, we propose a hybrid latent space representation that integrates a high-resolution triplane with a low-resolution 3D grid. This design not only compensates for the lack of explicit 3D representations but also leverages a triplane to preserve high-resolution details. Experimental results demonstrate that Hyper3D outperforms traditional representations by reconstructing 3D shapes with higher fidelity and finer details, making it well-suited for 3D generation pipelines.