🤖 AI Summary
This work addresses the high computational load, frequent memory accesses, and challenges in real-time low-power deployment of contrast maximization (CMAX) on edge devices, which stem from its iterative warp-and-accumulate pipeline. To overcome these limitations, the authors propose a hardware-software co-designed edge processor centered around memory efficiency. The architecture integrates tiled parallel memory organization, a downsample-coupled accumulation structure, and a runtime adaptive coarse-to-fine scheduling strategy to enable efficient CMAX acceleration on FPGA. Experimental results on a 200 MHz prototype demonstrate that, compared to a fixed scheduling scheme, the proposed approach achieves up to a 19% improvement in motion estimation accuracy, a 53.3% reduction in processing latency, 42% fewer effective memory accesses, and a 52.2% decrease in system energy consumption.
📝 Abstract
Contrast maximization (CMAX) is a direct geometric framework for event-based motion estimation, but its iterative warp-and-accumulate pipeline incurs input-dependent computation and frequent memory accesses, challenging real-time, low-power edge deployment. We present CMAX-CAMEL, a coarse-to-fine adaptive, memory-efficient, low-power edge processor for CMAX. CMAX-CAMEL combines a runtime-adaptive execution strategy with a memory-centric processor architecture. It adjusts coarse-to-fine execution according to the observed event distribution, prioritizing stages likely to improve estimation accuracy while suppressing low-value iterations and unnecessary stage transitions. Architecturally, a banked parallel memory organization sustains real-time throughput while reducing latency, and a subsampling-coupled accumulation structure lowers memory-access activity along the warp-and-accumulate dataflow. On a Virtex FPGA prototype operating at 200 MHz, CMAX-CAMEL improves estimation accuracy by up to 19% over fixed coarse-to-fine schedules, reduces processing latency by 53.3%, lowers effective memory accesses by 42%, and cuts total system energy by 52.2%, including adaptation overheads. These results show that CMAX-CAMEL is an HW-SW co-design that co-optimizes execution policy and data movement for real-time, low-power event-based motion estimation at the edge.