SelfMOTR: Revisiting MOTR with Self-Generating Detection Priors

📅 2025-11-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address weak detection performance and task conflict between detection and association in end-to-end multi-object tracking (MOT), this paper proposes a self-generated detection prior mechanism. We first uncover and exploit the inherent strong detection capability of MOTR-like models—eliminating the need for external detectors—and enable the Transformer decoder to autonomously generate high-quality detection priors. A dedicated prior fusion strategy is further designed to achieve joint optimization of detection and association. Ablation studies validate the effectiveness of each component. On the DanceTrack benchmark, our method achieves state-of-the-art performance, significantly improving IDF1 (+2.3%) and MOTA (+1.8%) over prior end-to-end approaches. This demonstrates that our framework effectively mitigates task interference while preserving the architectural simplicity and elegance of end-to-end MOT, thereby enhancing tracking robustness and accuracy.

Technology Category

Application Category

📝 Abstract
Despite progress toward end-to-end tracking with transformer architectures, poor detection performance and the conflict between detection and association in a joint architecture remain critical concerns. Recent approaches aim to mitigate these issues by (i) employing advanced denoising or label assignment strategies, or (ii) incorporating detection priors from external object detectors via distillation or anchor proposal techniques. Inspired by the success of integrating detection priors and by the key insight that MOTR-like models are secretly strong detection models, we introduce SelfMOTR, a novel tracking transformer that relies on self-generated detection priors. Through extensive analysis and ablation studies, we uncover and demonstrate the hidden detection capabilities of MOTR-like models, and present a practical set of tools for leveraging them effectively. On DanceTrack, SelfMOTR achieves strong performance, competing with recent state-of-the-art end-to-end tracking methods.
Problem

Research questions and friction points this paper is trying to address.

Addresses poor detection performance in transformer-based tracking models
Resolves conflict between detection and association in joint architectures
Leverages self-generated detection priors without external detectors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-generating detection priors for tracking
Leveraging hidden detection capabilities of MOTR
Practical tools for effective detection integration
🔎 Similar Papers
No similar papers found.
F
Fabian Gülhan
Institute of Imaging and Computer Vision, RWTH Aachen University, Germany
E
Emil Mededovic
Institute of Imaging and Computer Vision, RWTH Aachen University, Germany
Yuli Wu
Yuli Wu
RWTH Aachen University
Computer VisionRetinal Prosthesis
Johannes Stegmaier
Johannes Stegmaier
RWTH Aachen University
3D+t Image AnalysisMachine LearningMicroscopyDevelopmental BiologyMedical Image Analysis