🤖 AI Summary
Traditional feature matching methods (e.g., SIFT) suffer from high computational cost and limited keypoint density. To address this, we propose an efficient dense matching method leveraging motion vectors (MVs) extracted directly from the AV1 compressed domain. Our approach parses encoder-generated MVs, refines them via sub-pixel interpolation, and applies cosine-similarity-based consistency filtering to produce high-density, sub-pixel-accurate short trajectories suitable for SfM/SLAM frontends. This work constitutes the first use of AV1 MVs directly in visual reconstruction pipelines. Evaluated on a 117-frame video, our method achieves full-frame registration, reconstructing 460K–620K 3D points with reprojection errors of only 0.51–0.53 pixels, while significantly reducing CPU utilization. The method strikes a favorable balance among matching density, geometric accuracy, and computational efficiency, demonstrating strong potential for scalable, end-to-end 3D reconstruction deployment.
📝 Abstract
We repurpose AV1 motion vectors to produce dense sub-pixel correspondences and short tracks filtered by cosine consistency. On short videos, this compressed-domain front end runs comparably to sequential SIFT while using far less CPU, and yields denser matches with competitive pairwise geometry. As a small SfM demo on a 117-frame clip, MV matches register all images and reconstruct 0.46-0.62M points at 0.51-0.53,px reprojection error; BA time grows with match density. These results show compressed-domain correspondences are a practical, resource-efficient front end with clear paths to scaling in full pipelines.