PLANA3R: Zero-shot Metric Planar 3D Reconstruction via Feed-Forward Planar Splatting

📅 2025-10-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses metric-scale 3D planar reconstruction from uncalibrated, dual-view indoor images—without explicit plane annotations. The method leverages inherent geometric regularity in indoor scenes, treating planes as fundamental primitives. A Vision Transformer jointly estimates relative camera pose, depth, and surface normal maps, while differentiable plane splatting enables end-to-end optimization. Crucially, it is the first approach to achieve zero-shot, self-supervised planar-structured geometric reconstruction using only large-scale synthetic depth/normal renderings—eliminating reliance on ground-truth 3D plane annotations. Evaluated across multiple indoor benchmarks, it significantly outperforms existing feedforward methods, achieving state-of-the-art performance in surface reconstruction, depth estimation, relative pose estimation, and planar segmentation. Moreover, it demonstrates strong cross-domain generalization capability.

Technology Category

Application Category

📝 Abstract
This paper addresses metric 3D reconstruction of indoor scenes by exploiting their inherent geometric regularities with compact representations. Using planar 3D primitives - a well-suited representation for man-made environments - we introduce PLANA3R, a pose-free framework for metric Planar 3D Reconstruction from unposed two-view images. Our approach employs Vision Transformers to extract a set of sparse planar primitives, estimate relative camera poses, and supervise geometry learning via planar splatting, where gradients are propagated through high-resolution rendered depth and normal maps of primitives. Unlike prior feedforward methods that require 3D plane annotations during training, PLANA3R learns planar 3D structures without explicit plane supervision, enabling scalable training on large-scale stereo datasets using only depth and normal annotations. We validate PLANA3R on multiple indoor-scene datasets with metric supervision and demonstrate strong generalization to out-of-domain indoor environments across diverse tasks under metric evaluation protocols, including 3D surface reconstruction, depth estimation, and relative pose estimation. Furthermore, by formulating with planar 3D representation, our method emerges with the ability for accurate plane segmentation. The project page is available at https://lck666666.github.io/plana3r
Problem

Research questions and friction points this paper is trying to address.

Metric 3D reconstruction from unposed two-view indoor images
Learning planar 3D structures without explicit plane supervision
Generalizing reconstruction across diverse tasks and domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Vision Transformers for planar primitive extraction
Employs planar splatting for unsupervised geometry learning
Learns 3D structures without explicit plane supervision
🔎 Similar Papers
No similar papers found.
C
Changkun Liu
The Hong Kong University of Science and Technology
Bin Tan
Bin Tan
Ph.D Student, Wuhan University
Computer Vision
Z
Zeran Ke
Wuhan University
S
Shangzhan Zhang
Zhejiang University
J
Jiachen Liu
The Pennsylvania State University
M
Ming Qian
Wuhan University
N
Nan Xue
Ant Group
Yujun Shen
Yujun Shen
Ant Group
Generative ModelingComputer VisionDeep Learning
T
Tristan Braud
The Hong Kong University of Science and Technology