🤖 AI Summary
This work addresses the challenges of unstable multi-object tracking, redundant geometric representations, and difficult cross-modal fusion in panoramic imagery caused by wide-field distortion and occlusion. To this end, we propose a novel 3D multi-object tracking framework grounded in the unit sphere $\mathbb{S}^2$. Our approach introduces a joint state space on the sphere that unifies direction, scale, and depth, parameterizes azimuth via tangent-plane coordinates, and incorporates an extended spherical Kalman filter to fuse data from four fisheye cameras and a rotating LiDAR, ensuring geometrically consistent multi-modal tracking. Ground-truth trajectories are generated by aligning wearable device measurements with a global LiDAR map, enabling quantitative evaluation without motion-capture systems. Experiments demonstrate decimeter-level planar accuracy on real-world scenes captured in-house, significantly improved identity continuity in dynamic environments, and real-time performance on the Jetson AGX Orin platform.
📝 Abstract
Panoramic multi-object tracking is important for industrial safety monitoring, wide-area robotic perception, and infrastructure-light deployment in large workspaces. In these settings, the sensing system must provide full-surround coverage, metric geometric cues, and stable target association under wide field-of-view distortion and occlusion. Existing image-plane trackers are tightly coupled to the camera projection and become unreliable in panoramic imagery, while conventional Euclidean 3D formulations introduce redundant directional parameters and do not naturally unify angular, scale, and depth estimation. In this paper, we present $\mathbf{S^3KF}$, a panoramic 3D multi-object tracking framework built on a motorized rotating LiDAR and a quad-fisheye camera rig. The key idea is a geometry-consistent state representation on the unit sphere $\mathbb{S}^2$, where object bearing is modeled by a two-degree-of-freedom tangent-plane parameterization and jointly estimated with box scale and depth dynamics. Based on this state, we derive an extended spherical Kalman filtering pipeline that fuses panoramic camera detections with LiDAR depth observations for multimodal tracking. We further establish a map-based ground-truth generation pipeline using wearable localization devices registered to a shared global LiDAR map, enabling quantitative evaluation without motion-capture infrastructure. Experiments on self-collected real-world sequences show decimeter-level planar tracking accuracy, improved identity continuity over a 2D panoramic baseline in dynamic scenes, and real-time onboard operation on a Jetson AGX Orin platform. These results indicate that the proposed framework is a practical solution for panoramic perception and industrial-scale multi-object tracking.The project page can be found at https://kafeiyin00.github.io/S3KF/.