SAMOFT: Robust Multi-Object Tracking via Region and Flow

📅 2026-05-10

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses the challenge of instance-level feature degradation in multi-object tracking under complex scenarios involving object deformation, nonlinear motion, and occlusion. To this end, the authors propose an online tracking framework that fuses pixel-level segmentation regions with dense optical flow, innovatively integrating the Segment Anything Model (SAM) and flow information. The framework introduces three key components: pixel-wise motion matching, centroid distance matching, and a training-free distribution correction module. Furthermore, it incorporates Kalman-based motion prediction and Cluster-Aware ReID to enhance trajectory consistency. Evaluated on the DanceTrack and MOTChallenge benchmarks, the method significantly outperforms existing baselines, achieving state-of-the-art performance and demonstrating the critical role of pixel-level cues in improving tracking robustness.

📝 Abstract

Multi-object tracking (MOT) is a fundamental task in computer vision that requires continuously tracking multiple targets while maintaining consistent identities across frames. However, most existing approaches primarily rely on instance-level object features for trajectory association, which often leads to degraded performance under challenging conditions such as object deformation, nonlinear motion, and occlusion. In this work, we propose SAMOFT, a robust tracker that leverages pixel-level cues to improve robustness under complex motion scenarios. Specifically, we introduce a Pixel Motion Matching (PMM) module that integrates the Segment Anything Model (SAM) with dense optical flow to refine Kalman filter-based motion prediction using instantaneous foreground pixel motion. To further enhance robustness under unreliable detections, we design a Centroid Distance Matching (CDM) module that performs flexible mask-based centroid matching for low-confidence or partially occluded observations. Moreover, a Distribution-Based Correction (DBC) module models long-tailed motion patterns in a training-free manner using historical optical flow statistics and dynamically corrects trajectory states online. We also incorporate a Cluster-Aware ReID (CA-ReID) strategy to improve the stability and discriminative power of trajectory appearance features. Extensive experiments on the DanceTrack and MOTChallenge benchmarks demonstrate that SAMOFT consistently improves baseline trackers and achieves competitive performance compared with recent state-of-the-art methods, validating the effectiveness of leveraging pixel-level cues for robust multi-object tracking.

Problem

Research questions and friction points this paper is trying to address.

multi-object tracking

occlusion

nonlinear motion

object deformation

trajectory association

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pixel Motion Matching

Segment Anything Model

Dense Optical Flow