🤖 AI Summary
Existing implicit methods struggle to simultaneously achieve high resolution (≥1024³), arbitrary topology—including open surfaces and complex internal structures—and 3D mesh reconstruction under pure rendering supervision. This paper introduces the first differentiable, high-fidelity mesh generation framework supporting arbitrary topology. Our method comprises three key components: (i) a sparse structured isosurface representation coupled with FlexiCubes for efficient, differentiable isosurface extraction; (ii) a view-frustum-aware piecewise voxel training strategy, enabling internal structure reconstruction for the first time without explicit geometric supervision; and (iii) an end-to-end VAE-refined flow Transformer generative pipeline. Under pure rendering supervision, our approach reduces Chamfer distance by 82% and improves F-score by 88% over state-of-the-art methods. It enables high-resolution, topologically unconstrained, and geometrically detailed 3D shape reconstruction and generation.
📝 Abstract
Creating high-fidelity 3D meshes with arbitrary topology, including open surfaces and complex interiors, remains a significant challenge. Existing implicit field methods often require costly and detail-degrading watertight conversion, while other approaches struggle with high resolutions. This paper introduces SparseFlex, a novel sparse-structured isosurface representation that enables differentiable mesh reconstruction at resolutions up to $1024^3$ directly from rendering losses. SparseFlex combines the accuracy of Flexicubes with a sparse voxel structure, focusing computation on surface-adjacent regions and efficiently handling open surfaces. Crucially, we introduce a frustum-aware sectional voxel training strategy that activates only relevant voxels during rendering, dramatically reducing memory consumption and enabling high-resolution training. This also allows, for the first time, the reconstruction of mesh interiors using only rendering supervision. Building upon this, we demonstrate a complete shape modeling pipeline by training a variational autoencoder (VAE) and a rectified flow transformer for high-quality 3D shape generation. Our experiments show state-of-the-art reconstruction accuracy, with a ~82% reduction in Chamfer Distance and a ~88% increase in F-score compared to previous methods, and demonstrate the generation of high-resolution, detailed 3D shapes with arbitrary topology. By enabling high-resolution, differentiable mesh reconstruction and generation with rendering losses, SparseFlex significantly advances the state-of-the-art in 3D shape representation and modeling.