CMAX-CAMEL: A Coarse-to-Fine Adaptive, Memory-Efficient, and Low-Power Edge Processor for Contrast Maximization

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This work addresses the high computational load, frequent memory accesses, and challenges in real-time low-power deployment of contrast maximization (CMAX) on edge devices, which stem from its iterative warp-and-accumulate pipeline. To overcome these limitations, the authors propose a hardware-software co-designed edge processor centered around memory efficiency. The architecture integrates tiled parallel memory organization, a downsample-coupled accumulation structure, and a runtime adaptive coarse-to-fine scheduling strategy to enable efficient CMAX acceleration on FPGA. Experimental results on a 200 MHz prototype demonstrate that, compared to a fixed scheduling scheme, the proposed approach achieves up to a 19% improvement in motion estimation accuracy, a 53.3% reduction in processing latency, 42% fewer effective memory accesses, and a 52.2% decrease in system energy consumption.

📝 Abstract

Contrast maximization (CMAX) is a direct geometric framework for event-based motion estimation, but its iterative warp-and-accumulate pipeline incurs input-dependent computation and frequent memory accesses, challenging real-time, low-power edge deployment. We present CMAX-CAMEL, a coarse-to-fine adaptive, memory-efficient, low-power edge processor for CMAX. CMAX-CAMEL combines a runtime-adaptive execution strategy with a memory-centric processor architecture. It adjusts coarse-to-fine execution according to the observed event distribution, prioritizing stages likely to improve estimation accuracy while suppressing low-value iterations and unnecessary stage transitions. Architecturally, a banked parallel memory organization sustains real-time throughput while reducing latency, and a subsampling-coupled accumulation structure lowers memory-access activity along the warp-and-accumulate dataflow. On a Virtex FPGA prototype operating at 200 MHz, CMAX-CAMEL improves estimation accuracy by up to 19% over fixed coarse-to-fine schedules, reduces processing latency by 53.3%, lowers effective memory accesses by 42%, and cuts total system energy by 52.2%, including adaptation overheads. These results show that CMAX-CAMEL is an HW-SW co-design that co-optimizes execution policy and data movement for real-time, low-power event-based motion estimation at the edge.

Problem

Research questions and friction points this paper is trying to address.

contrast maximization

event-based motion estimation

memory access

low-power edge computing

real-time processing

Innovation

Methods, ideas, or system contributions that make the work stand out.

coarse-to-fine adaptation

memory-efficient architecture

event-based motion estimation