AllTracker: Efficient Dense Point Tracking at High Resolution

📅 2025-06-08

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

Existing methods struggle to simultaneously achieve high-resolution output, full-pixel denseness, and long-range (hundred-frame-level) point tracking: optical flow approaches are limited to adjacent frames, while point tracking methods lack global dense correspondences. This paper introduces the first framework enabling long-range dense point tracking from a single query frame across up to 100 frames. We propose a spatiotemporal joint modeling architecture that iteratively refines the optical flow field on a low-resolution grid, integrating 2D convolutional spatial propagation with pixel-aligned attention-based temporal propagation. The model contains only 16M parameters, achieves state-of-the-art accuracy at 768×1024 resolution, and runs in real time on a single 40GB GPU. Moreover, multi-dataset joint training significantly enhances cross-scene generalization capability.

Technology Category

Application Category

📝 Abstract

We introduce AllTracker: a model that estimates long-range point tracks by way of estimating the flow field between a query frame and every other frame of a video. Unlike existing point tracking methods, our approach delivers high-resolution and dense (all-pixel) correspondence fields, which can be visualized as flow maps. Unlike existing optical flow methods, our approach corresponds one frame to hundreds of subsequent frames, rather than just the next frame. We develop a new architecture for this task, blending techniques from existing work in optical flow and point tracking: the model performs iterative inference on low-resolution grids of correspondence estimates, propagating information spatially via 2D convolution layers, and propagating information temporally via pixel-aligned attention layers. The model is fast and parameter-efficient (16 million parameters), and delivers state-of-the-art point tracking accuracy at high resolution (i.e., tracking 768x1024 pixels, on a 40G GPU). A benefit of our design is that we can train on a wider set of datasets, and we find that doing so is crucial for top performance. We provide an extensive ablation study on our architecture details and training recipe, making it clear which details matter most. Our code and model weights are available at https://alltracker.github.io .

Problem

Research questions and friction points this paper is trying to address.

Estimates long-range point tracks between frames

Delivers high-resolution dense correspondence fields

Corresponds one frame to hundreds of subsequent frames

Innovation

Methods, ideas, or system contributions that make the work stand out.

Estimates flow fields between query and all frames

Uses iterative inference on low-resolution grids

Combines 2D convolution and pixel-aligned attention

🔎 Similar Papers

No similar papers found.