🤖 AI Summary
This work addresses the challenge of 3D detection and tracking of fast-moving small objects (e.g., squash balls) in RGB-D vision. We propose a physics-guided multimodal fusion framework: (1) an enhanced YOLOv8-based RGB-D detector for robust initial localization; (2) a kinematics-driven physical tracking model that explicitly enforces velocity and acceleration constraints; and (3) an anomaly detection and adaptive correction module integrated with Kalman filtering to handle occlusions and abrupt motion changes. Evaluated on a custom high-frame-rate squash dataset, our method reduces average 3D positional error by 70% compared to conventional RGB-D tracking approaches, while significantly improving real-time performance, robustness, and accuracy. The core contribution lies in embedding rigid-body kinematic priors into an end-to-end tracking architecture and establishing a closed-loop optimization pipeline unifying detection, physical modeling, filtering, and anomaly-aware correction.
📝 Abstract
While computer vision has advanced considerably for general object detection and tracking, the specific problem of fast-moving tiny objects remains underexplored. This paper addresses the significant challenge of detecting and tracking rapidly moving small objects using an RGB-D camera. Our novel system combines deep learning-based detection with physics-based tracking to overcome the limitations of existing approaches. Our contributions include: (1) a comprehensive system design for object detection and tracking of fast-moving small objects in 3D space, (2) an innovative physics-based tracking algorithm that integrates kinematics motion equations to handle outliers and missed detections, and (3) an outlier detection and correction module that significantly improves tracking performance in challenging scenarios such as occlusions and rapid direction changes. We evaluated our proposed system on a custom racquetball dataset. Our evaluation shows our system surpassing kalman filter based trackers with up to 70% less Average Displacement Error. Our system has significant applications for improving robot perception on autonomous platforms and demonstrates the effectiveness of combining physics-based models with deep learning approaches for real-time 3D detection and tracking of challenging small objects.