PlaneRecTR++: Unified Query Learning for Joint 3D Planar Reconstruction and Pose Estimation

📅 2023-07-25
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Existing monocular multi-view 3D plane reconstruction methods adopt a fragmented, multi-module paradigm, leading to poor inter-task coordination and suboptimal performance. This paper proposes the first end-to-end, single-stage unified framework that jointly optimizes plane detection, segmentation, parameter regression, inter-frame association, and 6-DoF camera pose estimation. Our core innovation is a learnable plane-query-based Transformer architecture that eliminates reliance on initial pose priors or manually annotated plane correspondences. We further introduce a multi-task joint loss and self-supervised cross-view consistency modeling to enforce geometric coherence across views. Extensive experiments on ScanNetv1/v2, NYUv2-Plane, and Matterport3D demonstrate consistent and significant improvements over state-of-the-art methods across all sub-tasks, with strong positive synergistic effects observed between modules.
📝 Abstract
3D plane reconstruction from images can usually be divided into several sub-tasks of plane detection, segmentation, parameters regression and possibly depth prediction for per-frame, along with plane correspondence and relative camera pose estimation between frames. Previous works tend to divide and conquer these sub-tasks with distinct network modules, overall formulated by a two-stage paradigm. With an initial camera pose and per-frame plane predictions provided from the first stage, exclusively designed modules, potentially relying on extra plane correspondence labelling, are applied to merge multi-view plane entities and produce 6DoF camera pose. As none of existing works manage to integrate above closely related sub-tasks into a unified framework but treat them separately and sequentially, we suspect it potentially as a main source of performance limitation for existing approaches. Motivated by this finding and the success of query-based learning in enriching reasoning among semantic entities, in this paper, we propose PlaneRecTR++, a Transformer-based architecture, which for the first time unifies all sub-tasks related to multi-view reconstruction and pose estimation with a compact single-stage model, refraining from initial pose estimation and plane correspondence supervision. Extensive quantitative and qualitative experiments demonstrate that our proposed unified learning achieves mutual benefits across sub-tasks, obtaining a new state-of-the-art performance on public ScanNetv1, ScanNetv2, NYUv2-Plane, and MatterPort3D datasets.
Problem

Research questions and friction points this paper is trying to address.

Unifies multi-view 3D planar reconstruction and pose estimation
Eliminates need for initial pose and correspondence supervision
Integrates segmentation, detection, and correspondence in single framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified single-stage Transformer architecture for all tasks
Eliminates initial pose estimation and correspondence supervision
Query-based learning enables cross-task semantic reasoning
🔎 Similar Papers
No similar papers found.
J
Jing Shi
National University of Defense Technology, China
Shuaifeng Zhi
Shuaifeng Zhi
Imperial College London
K
Kaiyang Xu
National University of Defense Technology, China