LiftFormer: Lifting and Frame Theory Based Monocular Depth Estimation Using Depth and Edge Oriented Subspace Representation

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Monocular depth estimation is inherently ill-posed, making accurate recovery of 3D structure challenging. This work proposes a novel approach grounded in lifting theory and frame theory: it introduces lifting theory to construct an intermediate subspace that maps color features to depth values, and leverages frame theory to achieve a redundant yet robust depth representation. Furthermore, an edge-aware subspace is designed to enhance depth prediction accuracy near object boundaries. The method achieves state-of-the-art performance across multiple benchmark datasets, with ablation studies confirming the effectiveness of each component, particularly demonstrating superior reconstruction quality in fine edge details.
📝 Abstract
Monocular depth estimation (MDE) has attracted increasing interest in the past few years, owing to its important role in 3D vision. MDE is the estimation of a depth map from a monocular image/video to represent the 3D structure of a scene, which is a highly ill-posed problem. To solve this problem, in this paper, we propose a LiftFormer based on lifting theory topology, for constructing an intermediate subspace that bridges the image color features and depth values, and a subspace that enhances the depth prediction around edges. MDE is formulated by transforming the depth value prediction problem into depth-oriented geometric representation (DGR) subspace feature representation, thus bridging the learning from color values to geometric depth values. A DGR subspace is constructed based on frame theory by using linearly dependent vectors in accordance with depth bins to provide a redundant and robust representation. The image spatial features are transformed into the DGR subspace, where these features correspond directly to the depth values. Moreover, considering that edges usually present sharp changes in a depth map and tend to be erroneously predicted, an edge-aware representation (ER) subspace is constructed, where depth features are transformed and further used to enhance the local features around edges. The experimental results demonstrate that our LiftFormer achieves state-of-the-art performance on widely used datasets, and an ablation study validates the effectiveness of both proposed lifting modules in our LiftFormer.
Problem

Research questions and friction points this paper is trying to address.

Monocular depth estimation
ill-posed problem
depth map
edge prediction
3D vision
Innovation

Methods, ideas, or system contributions that make the work stand out.

LiftFormer
lifting theory
frame theory
depth-oriented geometric representation
edge-aware representation
🔎 Similar Papers
No similar papers found.
Shuai Li
Shuai Li
Shandong University
IndRNNimage/video coding3D video processingcomputer visiondeep learning
H
Huibin Bai
School of Control Science and Engineering, Shandong University, and Key Laboratory of Machine Intelligence and System Control, Ministry of Education, Jinan 250100, China
Yanbo Gao
Yanbo Gao
Shandong University
Video Coding3D Video ProcessingDeep Learning
C
Chong Lv
School of Control Science and Engineering, Shandong University, and Key Laboratory of Machine Intelligence and System Control, Ministry of Education, Jinan 250100, China
H
Hui Yuan
School of Control Science and Engineering, Shandong University, and Key Laboratory of Machine Intelligence and System Control, Ministry of Education, Jinan 250100, China
C
Chuankun Li
State Key Laboratory of Dynamic Testing Technology and School of Information and Communication Engineering, North University of China, Taiyuan 030051, China
W
Wei Hua
Research Institute of Interdisciplinary Innovation, Zhejiang Lab, Hangzhou, China
T
Tian Xie
Research Institute of Interdisciplinary Innovation, Zhejiang Lab, Hangzhou, China