🤖 AI Summary
Traditional 3D compression methods suffer from severe structural degradation—including texture distortion, mesh collapse, and interstitial gaps—under extreme compression ratios (e.g., 100×). To address this, we propose a novel semantic compression paradigm that abandons geometric fidelity entirely, instead encoding 3D objects as natural language descriptions and leveraging large foundation models for spatial reconstruction. This work introduces the first end-to-end, semantics-driven 3D reconstruction framework wherein natural language serves as the sole compression medium. Our approach integrates CLIP-based feature alignment, text-conditioned diffusion modeling, and geometry-aware prior guidance. Evaluated on Objaverse, it achieves up to 105× compression ratio and significantly outperforms geometric codecs (e.g., Draco) at the critical 100× regime. Crucially, the reconstructed outputs are human-readable, editable, and collaboration-friendly—enabling efficient 3D content distribution and co-creation in open virtual worlds.
📝 Abstract
Traditional methods for 3D object compression operate only on structural information within the object vertices, polygons, and textures. These methods are effective at compression rates up to 10x for standard object sizes but quickly deteriorate at higher compression rates with texture artifacts, low-polygon counts, and mesh gaps. In contrast, semantic compression ignores structural information and operates directly on the core concepts to push to extreme levels of compression. In addition, it uses natural language as its storage format, which makes it natively human-readable and a natural fit for emerging applications built around large-scale, collaborative projects within augmented and virtual reality. It deprioritizes structural information like location, size, and orientation and predicts the missing information with state-of-the-art deep generative models. In this work, we construct a pipeline for 3D semantic compression from public generative models and explore the quality-compression frontier for 3D object compression. We apply this pipeline to achieve rates as high as 105x for 3D objects taken from the Objaverse dataset and show that semantic compression can outperform traditional methods in the important quality-preserving region around 100x compression.