AR2-4FV: Anchored Referring and Re-identification for Long-Term Grounding in Fixed-View Videos

📅 2026-03-08

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the challenge of linguistic reference discontinuity and re-identification difficulty in fixed-view videos caused by prolonged occlusion or object departure. To maintain referential coherence during target absence, the authors propose constructing an offline anchor library from the static background, where text-aligned anchor maps serve as persistent semantic memory. An anchor-driven re-entry prior combined with displacement-aware cues enables a lightweight ReID-Gating mechanism for efficient target recapture, without requiring initial-frame visibility or explicit modeling of appearance dynamics. Experiments demonstrate a 10.3% improvement in recapture rate and a 24.2% reduction in latency over the strongest baseline, while ablation studies confirm the contribution of each component.

Technology Category

Application Category

📝 Abstract

Long-term language-guided referring in fixed-view videos is challenging: the referent may be occluded or leave the scene for long intervals and later re-enter, while framewise referring pipelines drift as re-identification (ReID) becomes unreliable. AR2-4FV leverages background stability for long-term referring. An offline Anchor Bank is distilled from static background structures; at inference, the text query is aligned with this bank to produce an Anchor Map that serves as persistent semantic memory when the referent is absent. An anchor-based re-entry prior accelerates re-capture upon return, and a lightweight ReID-Gating mechanism maintains identity continuity using displacement cues in the anchor frame. The system predicts per-frame bounding boxes without assuming the target is visible in the first frame or explicitly modeling appearance variations. AR2-4FV achieves +10.3% Re-Capture Rate (RCR) improvement and -24.2% Re-Capture Latency (RCL) reduction over the best baseline, and ablation studies further confirm the benefits of the Anchor Map, re-entry prior, and ReID-Gating.

Problem

Research questions and friction points this paper is trying to address.

long-term referring

fixed-view videos

re-identification

occlusion

re-entry

Innovation

Methods, ideas, or system contributions that make the work stand out.

Anchor Map

Re-identification Gating

Long-term Referring