SAMOFT: Robust Multi-Object Tracking via Region and Flow

📅 2026-05-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

199K/year
🤖 AI Summary
This work addresses the challenge of instance-level feature degradation in multi-object tracking under complex scenarios involving object deformation, nonlinear motion, and occlusion. To this end, the authors propose an online tracking framework that fuses pixel-level segmentation regions with dense optical flow, innovatively integrating the Segment Anything Model (SAM) and flow information. The framework introduces three key components: pixel-wise motion matching, centroid distance matching, and a training-free distribution correction module. Furthermore, it incorporates Kalman-based motion prediction and Cluster-Aware ReID to enhance trajectory consistency. Evaluated on the DanceTrack and MOTChallenge benchmarks, the method significantly outperforms existing baselines, achieving state-of-the-art performance and demonstrating the critical role of pixel-level cues in improving tracking robustness.
📝 Abstract
Multi-object tracking (MOT) is a fundamental task in computer vision that requires continuously tracking multiple targets while maintaining consistent identities across frames. However, most existing approaches primarily rely on instance-level object features for trajectory association, which often leads to degraded performance under challenging conditions such as object deformation, nonlinear motion, and occlusion. In this work, we propose SAMOFT, a robust tracker that leverages pixel-level cues to improve robustness under complex motion scenarios. Specifically, we introduce a Pixel Motion Matching (PMM) module that integrates the Segment Anything Model (SAM) with dense optical flow to refine Kalman filter-based motion prediction using instantaneous foreground pixel motion. To further enhance robustness under unreliable detections, we design a Centroid Distance Matching (CDM) module that performs flexible mask-based centroid matching for low-confidence or partially occluded observations. Moreover, a Distribution-Based Correction (DBC) module models long-tailed motion patterns in a training-free manner using historical optical flow statistics and dynamically corrects trajectory states online. We also incorporate a Cluster-Aware ReID (CA-ReID) strategy to improve the stability and discriminative power of trajectory appearance features. Extensive experiments on the DanceTrack and MOTChallenge benchmarks demonstrate that SAMOFT consistently improves baseline trackers and achieves competitive performance compared with recent state-of-the-art methods, validating the effectiveness of leveraging pixel-level cues for robust multi-object tracking.
Problem

Research questions and friction points this paper is trying to address.

multi-object tracking
occlusion
nonlinear motion
object deformation
trajectory association
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pixel Motion Matching
Segment Anything Model
Dense Optical Flow
Distribution-Based Correction
Cluster-Aware ReID
🔎 Similar Papers
No similar papers found.
Y
Yanchao Wang
School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China
Dawei Zhang
Dawei Zhang
Zhejiang Normal University
Computer VisionDeep LearningMulti-modal Fusion
C
Chengzhuan Yang
School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China
W
Wei Liu
School of Automation and Intelligent Sensing, the Institute of Image Processing and Pattern Recognition, and the Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai 200240, China
M
Minglu Li
School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China
Hua Wang
Hua Wang
Professor, Victoria University
E-commerceAccess controlCloud computingBig data
Z
Zhonglong Zheng
School of Computer Science and Technology, Zhejiang Normal University, Jinhua 321004, China
Ming-Hsuan Yang
Ming-Hsuan Yang
University of California at Merced; Google DeepMind
Computer VisionMachine LearningArtificial Intelligence