rt-RISeg: Real-Time Model-Free Robot Interactive Segmentation for Active Instance-Level Object Understanding

📅 2025-07-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of real-time, accurate instance segmentation of unknown objects in novel environments for dexterous manipulation (e.g., grasping), this paper proposes a training-free, model-agnostic robotic interactive segmentation framework. Unlike static vision methods reliant on large-scale annotated datasets, our approach pioneers a dynamic vision paradigm wherein visual modeling evolves with robot motion: by analyzing object-relative rotational and linear velocity changes during interaction, we extract body-frame-invariant features (BFIFs) to enable active, incremental instance segmentation. The method requires no pretraining and achieves real-time segmentation solely through randomized sampling of body-frame motion trajectories. On unknown-object segmentation benchmarks, it achieves a 27.5% average accuracy improvement over state-of-the-art methods. Moreover, it generates high-quality segmentation masks that serve as effective prompts for vision foundation models, significantly enhancing downstream task performance.

Technology Category

Application Category

📝 Abstract
Successful execution of dexterous robotic manipulation tasks in new environments, such as grasping, depends on the ability to proficiently segment unseen objects from the background and other objects. Previous works in unseen object instance segmentation (UOIS) train models on large-scale datasets, which often leads to overfitting on static visual features. This dependency results in poor generalization performance when confronted with out-of-distribution scenarios. To address this limitation, we rethink the task of UOIS based on the principle that vision is inherently interactive and occurs over time. We propose a novel real-time interactive perception framework, rt-RISeg, that continuously segments unseen objects by robot interactions and analysis of a designed body frame-invariant feature (BFIF). We demonstrate that the relative rotational and linear velocities of randomly sampled body frames, resulting from selected robot interactions, can be used to identify objects without any learned segmentation model. This fully self-contained segmentation pipeline generates and updates object segmentation masks throughout each robot interaction without the need to wait for an action to finish. We showcase the effectiveness of our proposed interactive perception method by achieving an average object segmentation accuracy rate 27.5% greater than state-of-the-art UOIS methods. Furthermore, although rt-RISeg is a standalone framework, we show that the autonomously generated segmentation masks can be used as prompts to vision foundation models for significantly improved performance.
Problem

Research questions and friction points this paper is trying to address.

Segments unseen objects in real-time without pre-trained models
Uses robot interactions to improve segmentation accuracy
Enhances generalization by avoiding overfitting on static visual features
Innovation

Methods, ideas, or system contributions that make the work stand out.

Real-time interactive perception framework rt-RISeg
Body frame-invariant feature (BFIF) for segmentation
No learned model, uses robot interaction velocities
🔎 Similar Papers
No similar papers found.
H
Howard H. Qian
Department of Computer Science, Rice University
Y
Yiting Chen
Department of Computer Science, Rice University
G
Gaotian Wang
Department of Computer Science, Rice University
P
Podshara Chanrungmaneekul
Department of Computer Science, Rice University
Kaiyu Hang
Kaiyu Hang
Rice University
Robotic GraspingRobotic Manipulation