🤖 AI Summary
This work addresses the lack of 3D geometric priors in 2D latent spaces, which hinders differentiable inverse graphics modeling. We propose the Inverse Graphics Autoencoder (IG-AE), the first method enabling differentiable inverse graphics within a 2D latent space by jointly optimizing shared latent representations for both images and implicit 3D scenes. IG-AE establishes explicit alignment between image latent variables and geometrically consistent 3D latent scenes. Technically, it integrates latent-space regularization, joint 3D–2D representation learning, NeRF-based implicit field distillation, and extensions to the Nerfstudio framework. Experiments demonstrate that Latent NeRF—our instantiation of IG-AE—achieves superior reconstruction quality over standard autoencoder baselines, trains and renders significantly faster than pixel-level NeRFs, and natively interoperates with diverse latent-space vision methods. Thus, IG-AE provides an efficient, scalable, cross-modal interface bridging generative modeling and 3D scene understanding.
📝 Abstract
While pre-trained image autoencoders are increasingly utilized in computer vision, the application of inverse graphics in 2D latent spaces has been under-explored. Yet, besides reducing the training and rendering complexity, applying inverse graphics in the latent space enables a valuable interoperability with other latent-based 2D methods. The major challenge is that inverse graphics cannot be directly applied to such image latent spaces because they lack an underlying 3D geometry. In this paper, we propose an Inverse Graphics Autoencoder (IG-AE) that specifically addresses this issue. To this end, we regularize an image autoencoder with 3D-geometry by aligning its latent space with jointly trained latent 3D scenes. We utilize the trained IG-AE to bring NeRFs to the latent space with a latent NeRF training pipeline, which we implement in an open-source extension of the Nerfstudio framework, thereby unlocking latent scene learning for its supported methods. We experimentally confirm that Latent NeRFs trained with IG-AE present an improved quality compared to a standard autoencoder, all while exhibiting training and rendering accelerations with respect to NeRFs trained in the image space. Our project page can be found at https://ig-ae.github.io .