Persistent Object Gaussian Splat (POGS) for Tracking Human and Robot Manipulation of Irregularly Shaped Objects

📅 2025-03-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Continuous pose estimation and robust manipulation of unknown, irregular objects in dynamic environments remain challenging due to geometric ambiguity, occlusion, and unstructured disturbances. Method: We propose a task-oriented, incremental Gaussian Splatting (GS) representation framework that jointly embeds semantic labels, self-supervised visual features, and object-grouping embeddings into an online-updatable GS model. It fuses real-time stereo camera input with an initial multi-view scan—requiring neither CAD models nor repeated reconstruction. Pose updates are guided by depth estimation, while natural language–driven tool servoing enables disturbance-resilient regrasping and manipulation. Results: Our system achieves 12 consecutive successful regrasps and demonstrates robust recovery under ≤30° tool perturbations in 80% of grasping trials. It significantly enhances generalization and adaptability for open-world robotic manipulation, enabling persistent operation without manual intervention or prior object knowledge.

Technology Category

Application Category

📝 Abstract
Tracking and manipulating irregularly-shaped, previously unseen objects in dynamic environments is important for robotic applications in manufacturing, assembly, and logistics. Recently introduced Gaussian Splats efficiently model object geometry, but lack persistent state estimation for task-oriented manipulation. We present Persistent Object Gaussian Splat (POGS), a system that embeds semantics, self-supervised visual features, and object grouping features into a compact representation that can be continuously updated to estimate the pose of scanned objects. POGS updates object states without requiring expensive rescanning or prior CAD models of objects. After an initial multi-view scene capture and training phase, POGS uses a single stereo camera to integrate depth estimates along with self-supervised vision encoder features for object pose estimation. POGS supports grasping, reorientation, and natural language-driven manipulation by refining object pose estimates, facilitating sequential object reset operations with human-induced object perturbations and tool servoing, where robots recover tool pose despite tool perturbations of up to 30{deg}. POGS achieves up to 12 consecutive successful object resets and recovers from 80% of in-grasp tool perturbations.
Problem

Research questions and friction points this paper is trying to address.

Tracking irregularly-shaped objects in dynamic environments
Estimating object pose without rescanning or CAD models
Facilitating robotic manipulation and tool pose recovery
Innovation

Methods, ideas, or system contributions that make the work stand out.

POGS integrates semantics and visual features
Single stereo camera for object pose estimation
Supports grasping and natural language manipulation
🔎 Similar Papers
No similar papers found.