GraphSeg: Segmented 3D Representations via Graph Edge Addition and Contraction

📅 2025-04-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 2D segmentation models (e.g., SAM) suffer from over-segmentation and cross-view inconsistency when applied to 3D scenes in unstructured environments, especially without depth input. Method: This paper proposes a sparse multi-view 3D object segmentation framework that requires no depth information. It formulates 3D segmentation as edge addition and contraction operations on a graph structure. A dual-correspondence graph is constructed—comprising a 2D pixel similarity graph and an implicit 3D structural graph—and fused with outputs from 3D foundation models to yield object-level 3D representations. Using only a few 2D images, the method achieves cross-view-consistent segmentation via a graph neural network and a differentiable graph contraction algorithm. Contribution/Results: The approach achieves state-of-the-art accuracy on desktop-scale scenes, significantly reduces image requirements, and improves performance on downstream robotic manipulation tasks such as grasping.

Technology Category

Application Category

📝 Abstract
Robots operating in unstructured environments often require accurate and consistent object-level representations. This typically requires segmenting individual objects from the robot's surroundings. While recent large models such as Segment Anything (SAM) offer strong performance in 2D image segmentation. These advances do not translate directly to performance in the physical 3D world, where they often over-segment objects and fail to produce consistent mask correspondences across views. In this paper, we present GraphSeg, a framework for generating consistent 3D object segmentations from a sparse set of 2D images of the environment without any depth information. GraphSeg adds edges to graphs and constructs dual correspondence graphs: one from 2D pixel-level similarities and one from inferred 3D structure. We formulate segmentation as a problem of edge addition, then subsequent graph contraction, which merges multiple 2D masks into unified object-level segmentations. We can then leverage emph{3D foundation models} to produce segmented 3D representations. GraphSeg achieves robust segmentation with significantly fewer images and greater accuracy than prior methods. We demonstrate state-of-the-art performance on tabletop scenes and show that GraphSeg enables improved performance on downstream robotic manipulation tasks. Code available at https://github.com/tomtang502/graphseg.git.
Problem

Research questions and friction points this paper is trying to address.

Generates consistent 3D object segmentations from 2D images
Addresses over-segmentation and inconsistent mask correspondences in 3D
Improves robotic manipulation via segmented 3D representations
Innovation

Methods, ideas, or system contributions that make the work stand out.

GraphSeg uses graph edge addition for segmentation
Constructs dual correspondence graphs from 2D and 3D
Leverages 3D foundation models for segmented representations
🔎 Similar Papers
No similar papers found.
H
Haozhan Tang
Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA
T
Tianyi Zhang
Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA
Oliver Kroemer
Oliver Kroemer
Carnegie Mellon University - The Robotics Institute
RoboticsMachine LearningManipulation
Matthew Johnson-Roberson
Matthew Johnson-Roberson
Professor of Robotics, Carnegie Mellon University
RoboticsField RoboticsAutonomous VehiclesMarine Robotics
W
Weiming Zhi
Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA