CoWTracker: Tracking by Warping instead of Correlation

📅 2026-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a novel paradigm for dense point tracking that eliminates the reliance on cost volumes, whose quadratic complexity in spatial resolution limits efficiency and scalability. Instead, the method warps target-frame features back to the query frame based on current trajectory estimates and employs a spatiotemporal Transformer architecture for joint reasoning, enabling efficient long-range correspondence. By introducing deformation-based feature alignment into dense point tracking for the first time, the approach avoids explicit feature correlation computation and unifies the tasks of point tracking and optical flow estimation. It achieves state-of-the-art performance on TAP-Vid-DAVIS, TAP-Vid-Kinetics, and Robo-TAP benchmarks, while matching or surpassing specialized optical flow methods on established benchmarks such as Sintel, KITTI, and Spring.

Technology Category

Application Category

📝 Abstract
Dense point tracking is a fundamental problem in computer vision, with applications ranging from video analysis to robotic manipulation. State-of-the-art trackers typically rely on cost volumes to match features across frames, but this approach incurs quadratic complexity in spatial resolution, limiting scalability and efficiency. In this paper, we propose \method, a novel dense point tracker that eschews cost volumes in favor of warping. Inspired by recent advances in optical flow, our approach iteratively refines track estimates by warping features from the target frame to the query frame based on the current estimate. Combined with a transformer architecture that performs joint spatiotemporal reasoning across all tracks, our design establishes long-range correspondences without computing feature correlations. Our model is simple and achieves state-of-the-art performance on standard dense point tracking benchmarks, including TAP-Vid-DAVIS, TAP-Vid-Kinetics, and Robo-TAP. Remarkably, the model also excels at optical flow, sometimes outperforming specialized methods on the Sintel, KITTI, and Spring benchmarks. These results suggest that warping-based architectures can unify dense point tracking and optical flow estimation.
Problem

Research questions and friction points this paper is trying to address.

dense point tracking
cost volume
quadratic complexity
feature matching
scalability
Innovation

Methods, ideas, or system contributions that make the work stand out.

warping-based tracking
cost-volume-free
dense point tracking
optical flow unification
spatiotemporal transformer
🔎 Similar Papers
No similar papers found.