🤖 AI Summary
Existing optical flow methods suffer from quadratic computational and memory complexity in pixel count due to dense sampling of the full pairwise correlation volume, making it infeasible to achieve both accuracy and efficiency on ultra-high-definition (UHD) images. This work proposes the first efficient correlation volume sampling scheme that preserves mathematical equivalence to RAF-T: it leverages CUDA-optimized sparse indexing and cache-aware sampling, lightweight tensor rearrangement, and a runtime scheduling mechanism co-designed with SEA-RAFT. Compared to the baseline, our method achieves a 90% speedup in correlation sampling and reduces memory footprint by 95%, cutting end-to-end inference memory usage by 50%. It attains state-of-the-art accuracy on an 8K-resolution benchmark and—critically—enables real-time optical flow estimation for UHD video for the first time.
📝 Abstract
Recent optical flow estimation methods often employ local cost sampling from a dense all-pairs correlation volume. This results in quadratic computational and memory complexity in the number of pixels. Although an alternative memory-efficient implementation with on-demand cost computation exists, this is slower in practice and therefore prior methods typically process images at reduced resolutions, missing fine-grained details. To address this, we propose a more efficient implementation of the all-pairs correlation volume sampling, still matching the exact mathematical operator as defined by RAFT. Our approach outperforms on-demand sampling by up to 90% while maintaining low memory usage, and performs on par with the default implementation with up to 95% lower memory usage. As cost sampling makes up a significant portion of the overall runtime, this can translate to up to 50% savings for the total end-to-end model inference in memory-constrained environments. Our evaluation of existing methods includes an 8K ultra-high-resolution dataset and an additional inference-time modification of the recent SEA-RAFT method. With this, we achieve state-of-the-art results at high resolutions both in accuracy and efficiency.