ScrewSplat: An End-to-End Method for Articulated Object Recognition

📅 2025-08-04

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Existing articulated object recognition methods often rely on priors about the number of parts, depth-image inputs, or complex intermediate representations, limiting their generalizability and practical applicability. This paper introduces the first end-to-end framework for kinematic structure estimation of articulated objects from a single RGB image. Our approach uniquely integrates random initialization and optimization of screw axes with Gaussian Splatting-based 3D reconstruction—requiring neither geometric priors nor depth modalities—and jointly optimizes motion axes, rigid-part segmentation, and scene geometry in a unified manner. Evaluated on standard benchmarks, our method achieves state-of-the-art accuracy. Moreover, it supports text-guided zero-shot manipulation generation, significantly enhancing robustness and generalizability for articulated object understanding in real-world scenarios.

Technology Category

Application Category

📝 Abstract

Articulated object recognition -- the task of identifying both the geometry and kinematic joints of objects with movable parts -- is essential for enabling robots to interact with everyday objects such as doors and laptops. However, existing approaches often rely on strong assumptions, such as a known number of articulated parts; require additional inputs, such as depth images; or involve complex intermediate steps that can introduce potential errors -- limiting their practicality in real-world settings. In this paper, we introduce ScrewSplat, a simple end-to-end method that operates solely on RGB observations. Our approach begins by randomly initializing screw axes, which are then iteratively optimized to recover the object's underlying kinematic structure. By integrating with Gaussian Splatting, we simultaneously reconstruct the 3D geometry and segment the object into rigid, movable parts. We demonstrate that our method achieves state-of-the-art recognition accuracy across a diverse set of articulated objects, and further enables zero-shot, text-guided manipulation using the recovered kinematic model.

Problem

Research questions and friction points this paper is trying to address.

Identifies geometry and joints of movable objects

Operates solely on RGB images without depth

Recovers kinematic structure and 3D geometry simultaneously

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses RGB-only input for articulated recognition

Optimizes screw axes iteratively for kinematic structure

Integrates Gaussian Splatting for geometry and segmentation

🔎 Similar Papers

Survey on Modeling of Human-made Articulated Objects