Seg-Zero: Reasoning-Chain Guided Segmentation via Cognitive Reinforcement

📅 2025-03-09

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

To address weak generalization and the lack of explicit reasoning in open-domain referring segmentation, this paper proposes Seg-Zero—the first zero-shot reasoning segmentation framework. It employs a decoupled dual-model architecture comprising a reasoning model and a segmentation model: the reasoning model autonomously generates chain-of-thought explanations and spatial prompts, which guide the segmentation model to produce pixel-level masks without supervision, fine-tuning, or inference-time annotations. Crucially, we introduce GRPO-based reinforcement learning with a joint reward mechanism—incorporating both formatting fidelity and segmentation accuracy—to elicit emergent reasoning capabilities at test time. On the ReasonSeg benchmark, Seg-Zero-7B achieves a zero-shot mIoU of 57.5, outperforming LISA-7B by 18%, demonstrating substantial improvements in cross-domain generalization and reasoning interpretability.

Technology Category

Application Category

📝 Abstract

Traditional methods for reasoning segmentation rely on supervised fine-tuning with categorical labels and simple descriptions, limiting its out-of-domain generalization and lacking explicit reasoning processes. To address these limitations, we propose Seg-Zero, a novel framework that demonstrates remarkable generalizability and derives explicit chain-of-thought reasoning through cognitive reinforcement. Seg-Zero introduces a decoupled architecture consisting of a reasoning model and a segmentation model. The reasoning model interprets user intentions, generates explicit reasoning chains, and produces positional prompts, which are subsequently used by the segmentation model to generate precious pixel-level masks. We design a sophisticated reward mechanism that integrates both format and accuracy rewards to effectively guide optimization directions. Trained exclusively via reinforcement learning with GRPO and without explicit reasoning data, Seg-Zero achieves robust zero-shot generalization and exhibits emergent test-time reasoning capabilities. Experiments show that Seg-Zero-7B achieves a zero-shot performance of 57.5 on the ReasonSeg benchmark, surpassing the prior LISA-7B by 18%. This significant improvement highlights Seg-Zero's ability to generalize across domains while presenting an explicit reasoning process. Code is available at https://github.com/dvlab-research/Seg-Zero.

Problem

Research questions and friction points this paper is trying to address.

Improves out-of-domain generalization in segmentation tasks

Introduces explicit reasoning chains for better segmentation accuracy

Enhances zero-shot performance without explicit reasoning data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decoupled reasoning and segmentation architecture

Cognitive reinforcement with format and accuracy rewards

Zero-shot generalization via reinforcement learning

🔎 Similar Papers

No similar papers found.