XYZCylinder: Feedforward Reconstruction for Driving Scenes Based on A Unified Cylinder Lifting Method

📅 2025-10-09

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Existing feed-forward 3D reconstruction methods exhibit poor generalization and low accuracy in autonomous driving scenarios, primarily due to: (1) rigid viewpoint transformations that cannot adapt to diverse camera configurations; and (2) limited view overlap and scene complexity in 360° sparse-view settings, hindering geometric consistency modeling. This paper proposes a unified cylindrical lifting framework. Its core contributions are: (i) a parameterizable unified cylindrical camera model enabling zero-shot cross-configuration transfer; (ii) Cylindrical Planar Feature Groups (CPFG) with a hybrid representation mechanism to enhance geometric perception under sparse views; and (iii) a feed-forward Transformer architecture integrated with a multi-view fusion module for efficient end-to-end reconstruction. Extensive experiments demonstrate state-of-the-art performance across multiple benchmarks, achieving significant improvements in both reconstruction accuracy and cross-scene generalization.

Technology Category

Application Category

📝 Abstract

Recently, more attention has been paid to feedforward reconstruction paradigms, which mainly learn a fixed view transformation implicitly and reconstruct the scene with a single representation. However, their generalization capability and reconstruction accuracy are still limited while reconstructing driving scenes, which results from two aspects: (1) The fixed view transformation fails when the camera configuration changes, limiting the generalization capability across different driving scenes equipped with different camera configurations. (2) The small overlapping regions between sparse views of the $360^circ$ panorama and the complexity of driving scenes increase the learning difficulty, reducing the reconstruction accuracy. To handle these difficulties, we propose extbf{XYZCylinder}, a feedforward model based on a unified cylinder lifting method which involves camera modeling and feature lifting. Specifically, to improve the generalization capability, we design a Unified Cylinder Camera Modeling (UCCM) strategy, which avoids the learning of viewpoint-dependent spatial correspondence and unifies different camera configurations with adjustable parameters. To improve the reconstruction accuracy, we propose a hybrid representation with several dedicated modules based on newly designed Cylinder Plane Feature Group (CPFG) to lift 2D image features to 3D space. Experimental results show that XYZCylinder achieves state-of-the-art performance under different evaluation settings, and can be generalized to other driving scenes in a zero-shot manner. Project page: href{https://yuyuyu223.github.io/XYZCYlinder-projectpage/}{here}.

Problem

Research questions and friction points this paper is trying to address.

Improves generalization across varying camera configurations

Enhances reconstruction accuracy in complex driving scenes

Unifies camera modeling with adjustable parameter strategy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified cylinder camera modeling for generalization

Hybrid representation with cylinder plane feature group

Feedforward model lifting 2D features to 3D space

🔎 Similar Papers

OmniRe: Omni Urban Scene Reconstruction