🤖 AI Summary
Existing feed-forward 3D reconstruction methods exhibit poor generalization and low accuracy in autonomous driving scenarios, primarily due to: (1) rigid viewpoint transformations that cannot adapt to diverse camera configurations; and (2) limited view overlap and scene complexity in 360° sparse-view settings, hindering geometric consistency modeling. This paper proposes a unified cylindrical lifting framework. Its core contributions are: (i) a parameterizable unified cylindrical camera model enabling zero-shot cross-configuration transfer; (ii) Cylindrical Planar Feature Groups (CPFG) with a hybrid representation mechanism to enhance geometric perception under sparse views; and (iii) a feed-forward Transformer architecture integrated with a multi-view fusion module for efficient end-to-end reconstruction. Extensive experiments demonstrate state-of-the-art performance across multiple benchmarks, achieving significant improvements in both reconstruction accuracy and cross-scene generalization.
📝 Abstract
Recently, more attention has been paid to feedforward reconstruction paradigms, which mainly learn a fixed view transformation implicitly and reconstruct the scene with a single representation. However, their generalization capability and reconstruction accuracy are still limited while reconstructing driving scenes, which results from two aspects: (1) The fixed view transformation fails when the camera configuration changes, limiting the generalization capability across different driving scenes equipped with different camera configurations. (2) The small overlapping regions between sparse views of the $360^circ$ panorama and the complexity of driving scenes increase the learning difficulty, reducing the reconstruction accuracy. To handle these difficulties, we propose extbf{XYZCylinder}, a feedforward model based on a unified cylinder lifting method which involves camera modeling and feature lifting. Specifically, to improve the generalization capability, we design a Unified Cylinder Camera Modeling (UCCM) strategy, which avoids the learning of viewpoint-dependent spatial correspondence and unifies different camera configurations with adjustable parameters. To improve the reconstruction accuracy, we propose a hybrid representation with several dedicated modules based on newly designed Cylinder Plane Feature Group (CPFG) to lift 2D image features to 3D space. Experimental results show that XYZCylinder achieves state-of-the-art performance under different evaluation settings, and can be generalized to other driving scenes in a zero-shot manner. Project page: href{https://yuyuyu223.github.io/XYZCYlinder-projectpage/}{here}.