rt-RISeg: Real-Time Model-Free Robot Interactive Segmentation for Active Instance-Level Object Understanding

📅 2025-07-14

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

To address the challenge of real-time, accurate instance segmentation of unknown objects in novel environments for dexterous manipulation (e.g., grasping), this paper proposes a training-free, model-agnostic robotic interactive segmentation framework. Unlike static vision methods reliant on large-scale annotated datasets, our approach pioneers a dynamic vision paradigm wherein visual modeling evolves with robot motion: by analyzing object-relative rotational and linear velocity changes during interaction, we extract body-frame-invariant features (BFIFs) to enable active, incremental instance segmentation. The method requires no pretraining and achieves real-time segmentation solely through randomized sampling of body-frame motion trajectories. On unknown-object segmentation benchmarks, it achieves a 27.5% average accuracy improvement over state-of-the-art methods. Moreover, it generates high-quality segmentation masks that serve as effective prompts for vision foundation models, significantly enhancing downstream task performance.

Technology Category

Application Category

📝 Abstract

Successful execution of dexterous robotic manipulation tasks in new environments, such as grasping, depends on the ability to proficiently segment unseen objects from the background and other objects. Previous works in unseen object instance segmentation (UOIS) train models on large-scale datasets, which often leads to overfitting on static visual features. This dependency results in poor generalization performance when confronted with out-of-distribution scenarios. To address this limitation, we rethink the task of UOIS based on the principle that vision is inherently interactive and occurs over time. We propose a novel real-time interactive perception framework, rt-RISeg, that continuously segments unseen objects by robot interactions and analysis of a designed body frame-invariant feature (BFIF). We demonstrate that the relative rotational and linear velocities of randomly sampled body frames, resulting from selected robot interactions, can be used to identify objects without any learned segmentation model. This fully self-contained segmentation pipeline generates and updates object segmentation masks throughout each robot interaction without the need to wait for an action to finish. We showcase the effectiveness of our proposed interactive perception method by achieving an average object segmentation accuracy rate 27.5% greater than state-of-the-art UOIS methods. Furthermore, although rt-RISeg is a standalone framework, we show that the autonomously generated segmentation masks can be used as prompts to vision foundation models for significantly improved performance.

Problem

Research questions and friction points this paper is trying to address.

Segments unseen objects in real-time without pre-trained models

Uses robot interactions to improve segmentation accuracy

Enhances generalization by avoiding overfitting on static visual features

Innovation

Methods, ideas, or system contributions that make the work stand out.

Real-time interactive perception framework rt-RISeg

Body frame-invariant feature (BFIF) for segmentation

No learned model, uses robot interaction velocities

🔎 Similar Papers

Robot Instance Segmentation with Few Annotations for Grasping

2024-07-01arXiv.orgCitations: 1

Bosch Group

Attraktive Vergütung

Horb am Neckar, BW, DE

Research Scientist Intern, Machine Perception for Input and Interaction (PhD)