Towards In-the-wild 3D Plane Reconstruction from a Single Image

📅 2025-06-03

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

Single-image 3D planar reconstruction suffers from poor cross-domain generalization, as existing methods are typically trained on single-scene domains (indoor or outdoor). This work introduces the first zero-shot indoor-outdoor generalizable plane reconstruction framework. Methodologically, it (1) proposes a geometric representation that decouples surface normals from offsets; (2) designs an exemplar-guided joint classification-regression paradigm; (3) establishes PlaneBench—a large-scale planar benchmark encompassing 14 datasets and 560K densely annotated planes; and (4) adopts a Transformer architecture integrating multi-source adaptive image encoders with a pixel-geometry co-enhanced plane embedding module. Evaluated across multiple zero-shot test sets, our approach significantly outperforms state-of-the-art methods, achieving substantial gains in reconstruction accuracy and generalization robustness—particularly for challenging in-the-wild scenes.

Technology Category

Application Category

📝 Abstract

3D plane reconstruction from a single image is a crucial yet challenging topic in 3D computer vision. Previous state-of-the-art (SOTA) methods have focused on training their system on a single dataset from either indoor or outdoor domain, limiting their generalizability across diverse testing data. In this work, we introduce a novel framework dubbed ZeroPlane, a Transformer-based model targeting zero-shot 3D plane detection and reconstruction from a single image, over diverse domains and environments. To enable data-driven models across multiple domains, we have curated a large-scale planar benchmark, comprising over 14 datasets and 560,000 high-resolution, dense planar annotations for diverse indoor and outdoor scenes. To address the challenge of achieving desirable planar geometry on multi-dataset training, we propose to disentangle the representation of plane normal and offset, and employ an exemplar-guided, classification-then-regression paradigm to learn plane and offset respectively. Additionally, we employ advanced backbones as image encoder, and present an effective pixel-geometry-enhanced plane embedding module to further facilitate planar reconstruction. Extensive experiments across multiple zero-shot evaluation datasets have demonstrated that our approach significantly outperforms previous methods on both reconstruction accuracy and generalizability, especially over in-the-wild data. Our code and data are available at: https://github.com/jcliu0428/ZeroPlane.

Problem

Research questions and friction points this paper is trying to address.

Achieving zero-shot 3D plane reconstruction from single images across diverse domains

Overcoming limited generalizability of prior methods trained on single datasets

Enhancing planar geometry accuracy via disentangled normal-offset representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based model for zero-shot 3D plane detection

Large-scale planar benchmark with diverse annotations

Exemplar-guided classification-then-regression paradigm

🔎 Similar Papers

Generalizable 3D Scene Reconstruction via Divide and Conquer from a Single View