GOT-Edit: Geometry-Aware Generic Object Tracking via Online Model Editing

πŸ“… 2026-02-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the limitations of existing general-purpose object tracking methods, which predominantly rely on 2D features and lack explicit modeling of 3D geometric structure, leading to significant performance degradation under occlusion, distractors, and appearance-geometry variations. To overcome this, the paper introduces, for the first time, an online cross-modal model editing mechanism within a general tracking framework. Leveraging a pre-trained vision-geometry guided Transformer, the approach infers 3D geometric cues from monocular video and incorporates them via null-space constrained parameter updates. This enables effective fusion of 2D semantic and 3D geometric information without compromising the model’s original discriminative semantic capabilities. Extensive experiments demonstrate substantial improvements in both accuracy and robustness across multiple mainstream tracking benchmarks, with particularly notable gains in scenarios involving occlusion and complex backgrounds.

Technology Category

Application Category

πŸ“ Abstract
Human perception for effective object tracking in a 2D video stream arises from the implicit use of prior 3D knowledge combined with semantic reasoning. In contrast, most generic object tracking (GOT) methods primarily rely on 2D features of the target and its surroundings while neglecting 3D geometric cues, which makes them susceptible to partial occlusion, distractors, and variations in geometry and appearance. To address this limitation, we introduce GOT-Edit, an online cross-modality model editing approach that integrates geometry-aware cues into a generic object tracker from a 2D video stream. Our approach leverages features from a pre-trained Visual Geometry Grounded Transformer to enable geometric cue inference from only a few 2D images. To tackle the challenge of seamlessly combining geometry and semantics, GOT-Edit performs online model editing with null-space constrained updates that incorporate geometric information while preserving semantic discrimination, yielding consistently better performance across diverse scenarios. Extensive experiments on multiple GOT benchmarks demonstrate that GOT-Edit achieves superior robustness and accuracy, particularly under occlusion and clutter, establishing a new paradigm for combining 2D semantics with 3D geometric reasoning for generic object tracking.
Problem

Research questions and friction points this paper is trying to address.

generic object tracking
3D geometric cues
occlusion
distractors
appearance variation
Innovation

Methods, ideas, or system contributions that make the work stand out.

geometry-aware tracking
online model editing
null-space constrained update
generic object tracking
3D geometric reasoning
πŸ”Ž Similar Papers
No similar papers found.