AsyncMDE: Real-Time Monocular Depth Estimation via Asynchronous Spatial Memory

📅 2026-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high computational cost of foundation models for monocular depth estimation, which hinders their real-time deployment on edge devices, and the underutilization of inter-frame redundancy in existing approaches. To overcome these limitations, we propose AsyncMDE, the first method to introduce an asynchronous spatial memory mechanism. It decouples computation by leveraging a large background model to generate high-quality spatial features while a lightweight foreground model performs real-time inference. Cross-frame feature reuse is enabled through an autoregressive memory update and a complementary fusion strategy. With only 3.83 million parameters, AsyncMDE achieves 237 FPS on an RTX 4090—recovering 77% of the foundation model’s accuracy—and 161 FPS on a Jetson AGX Orin, significantly outperforming current state-of-the-art methods.

Technology Category

Application Category

📝 Abstract
Foundation-model-based monocular depth estimation offers a viable alternative to active sensors for robot perception, yet its computational cost often prohibits deployment on edge platforms. Existing methods perform independent per-frame inference, wasting the substantial computational redundancy between adjacent viewpoints in continuous robot operation. This paper presents AsyncMDE, an asynchronous depth perception system consisting of a foundation model and a lightweight model that amortizes the foundation model's computational cost over time. The foundation model produces high-quality spatial features in the background, while the lightweight model runs asynchronously in the foreground, fusing cached memory with current observations through complementary fusion, outputting depth estimates, and autoregressively updating the memory. This enables cross-frame feature reuse with bounded accuracy degradation. At a mere 3.83M parameters, it operates at 237 FPS on an RTX 4090, recovering 77% of the accuracy gap to the foundation model while achieving a 25X parameter reduction. Validated across indoor static, dynamic, and synthetic extreme-motion benchmarks, AsyncMDE degrades gracefully between refreshes and achieves 161FPS on a Jetson AGX Orin with TensorRT, clearly demonstrating its feasibility for real-time edge deployment.
Problem

Research questions and friction points this paper is trying to address.

monocular depth estimation
real-time perception
edge deployment
computational redundancy
foundation model
Innovation

Methods, ideas, or system contributions that make the work stand out.

asynchronous inference
monocular depth estimation
spatial memory
foundation model
edge deployment
🔎 Similar Papers
No similar papers found.
L
Lianjie Ma
School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China
Yuquan Li
Yuquan Li
Associate Professor at Guizhou University
AI for ScienceMolecule Design
B
Bingzheng Jiang
School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, China
Ziming Zhong
Ziming Zhong
ShanghaiTech University
H
Han Ding
School of Mechanical Science and Engineering, Huazhong University of Science and Technology, Wuhan, China
Lijun Zhu
Lijun Zhu
Purdue University