A Modular Pipeline for 3D Object Tracking Using RGB Cameras

📅 2025-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of 3D multi-object tracking (MOT) with small targets, high occlusion density, frequent entry/exit, and unknown camera poses in multi-view RGB setups. We propose a lightweight, modular 3D MOT framework integrating multi-view geometry, feature matching, PnP-based pose estimation, and extended Kalman filtering (EKF). Our key methodological contribution is the first automatic instance-aware EKF architecture supporting dynamic object creation and deletion—eliminating the need for prior camera calibration while enabling real-time, covariance-aware 3D trajectory estimation. Evaluated on the Table Setting Dataset comprising over ten million frames, our approach achieves centimeter-level average localization accuracy across hundreds of trials; the estimated covariance matrices effectively quantify positional uncertainty. The framework significantly enhances robustness and deployability of 3D MOT in complex, unstructured environments.

Technology Category

Application Category

📝 Abstract
Object tracking is a key challenge of computer vision with various applications that all require different architectures. Most tracking systems have limitations such as constraining all movement to a 2D plane and they often track only one object. In this paper, we present a new modular pipeline that calculates 3D trajectories of multiple objects. It is adaptable to various settings where multiple time-synced and stationary cameras record moving objects, using off the shelf webcams. Our pipeline was tested on the Table Setting Dataset, where participants are recorded with various sensors as they set a table with tableware objects. We need to track these manipulated objects, using 6 rgb webcams. Challenges include: Detecting small objects in 9.874.699 camera frames, determining camera poses, discriminating between nearby and overlapping objects, temporary occlusions, and finally calculating a 3D trajectory using the right subset of an average of 11.12.456 pixel coordinates per 3-minute trial. We implement a robust pipeline that results in accurate trajectories with covariance of x,y,z-position as a confidence metric. It deals dynamically with appearing and disappearing objects, instantiating new Extended Kalman Filters. It scales to hundreds of table-setting trials with very little human annotation input, even with the camera poses of each trial unknown. The code is available at https://github.com/LarsBredereke/object_tracking
Problem

Research questions and friction points this paper is trying to address.

Develops a modular pipeline for 3D object tracking using RGB cameras.
Addresses challenges in tracking multiple objects with temporary occlusions.
Enables scalable, accurate 3D trajectory calculation with minimal human input.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular pipeline for 3D object tracking
Uses multiple stationary RGB webcams
Implements Extended Kalman Filters dynamically
🔎 Similar Papers
No similar papers found.