StAR: Segment Anything Reasoner

📅 2026-03-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing reasoning-based segmentation methods struggle to effectively harness the implicit query-driven localization capabilities of foundation models in complex scenes. To address this limitation, this work proposes the StAR framework, which activates the latent visual reasoning abilities of foundation models through parameter-efficient fine-tuning, a tailored reward function, and a selective tuning strategy. Notably, StAR introduces parallel test-time scaling to segmentation tasks for the first time. The study also presents ReasonSeg-X, a new dataset encompassing diverse and deep reasoning types, along with a fine-grained evaluation benchmark. Furthermore, a rollout-expanded selective-tuning training strategy is devised. Remarkably, with only 5,000 training samples, StAR substantially outperforms existing approaches across multiple benchmarks, demonstrating its efficacy in unlocking the visual reasoning potential of foundation models.

Technology Category

Application Category

📝 Abstract
As AI systems are being integrated more rapidly into diverse and complex real-world environments, the ability to perform holistic reasoning over an implicit query and an image to localize a target is becoming increasingly important. However, recent reasoning segmentation methods fail to sufficiently elicit the visual reasoning capabilities of the base mode. In this work, we present Segment Anything Reasoner (StAR), a comprehensive framework that refines the design space from multiple perspectives-including parameter-tuning scheme, reward functions, learning strategies and answer format-and achieves substantial improvements over recent baselines. In addition, for the first time, we successfully introduce parallel test-time scaling to the segmentation task, pushing the performance boundary even further. To extend the scope and depth of reasoning covered by existing benchmark, we also construct the ReasonSeg-X, which compactly defines reasoning types and includes samples that require deeper reasoning. Leveraging this dataset, we train StAR with a rollout-expanded selective-tuning approach to activate the base model's latent reasoning capabilities, and establish a rigorous benchmark for systematic, fine-grained evaluation of advanced methods. With only 5k training samples, StAR achieves significant gains over its base counterparts across extensive benchmarks, demonstrating that our method effectively brings dormant reasoning competence to the surface.
Problem

Research questions and friction points this paper is trying to address.

reasoning segmentation
visual reasoning
holistic reasoning
segmentation task
reasoning capability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Segment Anything Reasoner
parallel test-time scaling
reasoning segmentation
selective-tuning
ReasonSeg-X
🔎 Similar Papers
No similar papers found.
Seokju Yun
Seokju Yun
University of Seoul
representation learningmulti-modal learning3D/4D generation
D
Dongheon Lee
University of Seoul
N
Noori Bae
University of Seoul
J
Jaesung Jun
University of Seoul
C
Chanseul Cho
University of Seoul
Youngmin Ro
Youngmin Ro
Assistant Professor, University of Seoul
deep learningcomputer vision