QuartDepth: Post-Training Quantization for Real-Time Depth Estimation on the Edge

📅 2025-03-20

📈 Citations: 0

✨ Influential: 0

career value

238K/year

🤖 AI Summary

Deploying monocular depth estimation (MDE) models on edge ASICs faces challenges of high computational and memory overhead. To address this, we propose a 4-bit post-training quantization (PTQ) framework featuring a novel activation polishing and compensation algorithm alongside a weight reconstruction method, effectively mitigating accuracy degradation inherent in ultra-low-bit quantization. We further design a domain-specific ASIC accelerator supporting kernel fusion and programmable instructions. Experiments demonstrate that our approach maintains state-of-the-art accuracy on benchmarks such as KITTI (δ₁ > 92%), achieves a 3.8× speedup in inference latency, and delivers 12.6 TOPS/W energy efficiency—marking the first high-accuracy, real-time MDE deployment on edge ASICs. Our core contributions are: (i) lightweight, low-bit quantization algorithms tailored for MDE; (ii) a hardware–software co-designed 4-bit MDE-specific accelerator architecture; and (iii) end-to-end validation of edge deployment.

Technology Category

Application Category

📝 Abstract

Monocular Depth Estimation (MDE) has emerged as a pivotal task in computer vision, supporting numerous real-world applications. However, deploying accurate depth estimation models on resource-limited edge devices, especially Application-Specific Integrated Circuits (ASICs), is challenging due to the high computational and memory demands. Recent advancements in foundational depth estimation deliver impressive results but further amplify the difficulty of deployment on ASICs. To address this, we propose QuartDepth which adopts post-training quantization to quantize MDE models with hardware accelerations for ASICs. Our approach involves quantizing both weights and activations to 4-bit precision, reducing the model size and computation cost. To mitigate the performance degradation, we introduce activation polishing and compensation algorithm applied before and after activation quantization, as well as a weight reconstruction method for minimizing errors in weight quantization. Furthermore, we design a flexible and programmable hardware accelerator by supporting kernel fusion and customized instruction programmability, enhancing throughput and efficiency. Experimental results demonstrate that our framework achieves competitive accuracy while enabling fast inference and higher energy efficiency on ASICs, bridging the gap between high-performance depth estimation and practical edge-device applicability. Code: https://github.com/shawnricecake/quart-depth

Problem

Research questions and friction points this paper is trying to address.

Quantize depth estimation models for edge devices

Reduce model size and computation cost

Improve accuracy and efficiency on ASICs

Innovation

Methods, ideas, or system contributions that make the work stand out.

4-bit quantization for weights and activations

Activation polishing and compensation algorithm

Flexible hardware accelerator with kernel fusion

🔎 Similar Papers

No similar papers found.