Monocular Depth Estimation via Neural Network with Learnable Algebraic Group and Ring Structures

📅 2026-04-27
📈 Citations: 0
Influential: 0
📄 PDF

career value

206K/year
🤖 AI Summary
This work addresses the limited generalization of existing monocular depth estimation methods under viewpoint changes, which stems from their neglect of the algebraic and geometric structures inherent in perspective projection. To overcome this, the paper introduces algebraic geometry into deep learning for the first time, proposing a modeling framework that enforces projective equivariance and topological consistency. The approach achieves geometry-aware multiscale feature fusion through learnable group actions, graded ring homomorphisms, and Čech neural layer architectures. Key components include Group-defined Feature Manifolds (GFM), Ring Convolutional Layers (RCL), and a sheaf-theoretic Sheaf Module (SM). Zero-shot evaluations on KITTI, NYU-Depth V2, and ETH3D demonstrate substantial improvements over state-of-the-art methods in both accuracy and cross-domain generalization.

Technology Category

Application Category

📝 Abstract
Monocular depth estimation (MDE) has witnessed remarkable progress driven by Convolutional Neural Networks and transformer-based architectures. However, these approaches typically treat the problem as a generic image-to-image regression on Euclidean grids, thereby overlooking the intrinsic algebraic and geometric structures induced by perspective projection. To address this limitation, we propose LAGRNet, a novel framework that fundamentally grounds MDE in algebraic geometry by explicitly embedding learnable group, ring, and sheaf structures into the deep learning pipeline. Modeling feature maps as sections of a sheaf over an approximated image manifold, our method first establishes a Group-defined Feature Manifold (GFM) parameterized by a learned algebraic group action to enforce projective equivariance and robustness against view changes. To facilitate algebraically consistent cross-scale interactions, we subsequently introduce a Ring Convolution Layer (RCL) that formulates feature fusion as a graded ring homomorphism. Furthermore, to ensure global topological consistency, a Sheaf-based Module (SM) aggregates local depth cues via Čech nerve on the image topology. Extensive zero-shot evaluations across the KITTI, NYU-Depth V2, and ETH3D benchmarks demonstrate that LAGRNet significantly outperforms state-of-the-art methods in both accuracy and generalization capabilities.
Problem

Research questions and friction points this paper is trying to address.

Monocular Depth Estimation
Algebraic Structure
Geometric Structure
Perspective Projection
Depth Estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

algebraic group
ring convolution
sheaf theory
projective equivariance
monocular depth estimation
🔎 Similar Papers
No similar papers found.
Q
Qianlei Wang
Chengdu Institute of Computer Applications, Chinese Academy of Sciences, Chengdu 610213 China; School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 101408 China
K
Kexun Chen
School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu 611756 China
S
Shaolin Zhang
Chengdu Institute of Computer Applications, Chinese Academy of Sciences, Chengdu 610213 China; School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 101408 China
H
Hongli Gao
School of Mechanical Engineering, Southwest Jiaotong University, Chengdu 611756 China
Chaoning Zhang
Chaoning Zhang
Professor at UESTC (电子科技大学, China)
Computer VisionLLM and VLMGenAI and AIGC Detection
X
Xiaolin Qin
Chengdu Institute of Computer Applications, Chinese Academy of Sciences, Chengdu 610213 China; School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 101408 China