Rethinking Dense Optical Flow without Test-Time Scaling

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This work addresses the high computational cost of traditional dense optical flow methods, which rely on iterative optimization at test time. The authors propose a novel single forward-pass approach that eliminates the need for test-time iterations, demonstrating for the first time that semantic and geometric priors from pretrained foundation models can effectively replace iterative refinement. Specifically, the method freezes DINO-v2 to extract semantic features and integrates geometric cues from a monocular depth model, enabling global matching-based optical flow estimation. Evaluated on Sintel Final, the approach achieves an end-point error (EPE) of 2.81, outperforming several state-of-the-art methods—including SEA-RAFT, RAFT, GMFlow (without refinement), and FlowSeek—while exhibiting strong cross-dataset generalization capabilities.

📝 Abstract

Recent progress in dense optical flow has been driven by increasingly complex architectures and multi-step refinement for test-time scaling. While these approaches achieve strong benchmark performance, they also require substantial computation during inference. This raises a fundamental question: Is scaling test-time computation the only way to improve dense optical flow accuracy? We argue that it is not. Instead, powerful visual semantic and geometric priors encoded in modern foundation models can reduce, if not overcome, the need for computationally expensive iterative refinement at test-time. In this paper, we present a framework that estimates dense optical flow in a single forward pass, leveraging pretrained foundation representations, while avoiding iterative refinement and additional inference-time computation, thus offering an alternative to test-time scaling. Our method extracts visual semantic features from a frozen DINO-v2 backbone and combines them with geometric cues from a monocular depth foundation model. We fuse these complementary priors into a unified representation and apply a global matching formulation to estimate dense correspondences without recurrent updates or test-time optimization. Despite avoiding iterative refinement, our approach achieves strong cross-dataset generalization across challenging benchmarks. On Sintel Final, we obtain 2.81 EPE without refinement, significantly improving over state-of-the-art (SOTA) SEA-RAFT under comparable training conditions and outperforming RAFT, GMFlow (without refinement), and recent FlowSeek in the same setting. These results suggest that strong foundation priors can substitute for test-time scaling, offering a computationally efficient alternative to refinement-heavy pipelines.

Problem

Research questions and friction points this paper is trying to address.

dense optical flow

test-time scaling

computational efficiency

iterative refinement

foundation models

Innovation

Methods, ideas, or system contributions that make the work stand out.

foundation models

dense optical flow

test-time scaling