S2-UniSeg: Fast Universal Agglomerative Pooling for Scalable Segment Anything without Supervision

📅 2025-08-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing self-supervised image segmentation methods rely on multi-stage training with offline pseudo-mask generation, resulting in poor scalability and discontinuous optimization. This paper proposes UniSeg, an end-to-end trainable universal segmentation framework. It introduces UniAP—a millisecond-level pseudo-mask generation algorithm—and integrates query-wise self-distillation with a momentum-based teacher-student architecture to jointly model semantic and instance segmentation at multiple granularities on SA-1B. By eliminating costly offline steps, UniSeg enables continuous optimization and efficient large-scale extension. Extensive experiments demonstrate consistent superiority over UnSAM: +6.9 AP, +11.1 AR on COCO; +4.5 Pixel Accuracy on COCOStuff-27; +8.0 RQ on UVO; and significant gains on Cityscapes. Performance further improves after large-scale training.

Technology Category

Application Category

📝 Abstract
Recent self-supervised image segmentation models have achieved promising performance on semantic segmentation and class-agnostic instance segmentation. However, their pretraining schedule is multi-stage, requiring a time-consuming pseudo-masks generation process between each training epoch. This time-consuming offline process not only makes it difficult to scale with training dataset size, but also leads to sub-optimal solutions due to its discontinuous optimization routine. To solve these, we first present a novel pseudo-mask algorithm, Fast Universal Agglomerative Pooling (UniAP). Each layer of UniAP can identify groups of similar nodes in parallel, allowing to generate both semantic-level and instance-level and multi-granular pseudo-masks within ens of milliseconds for one image. Based on the fast UniAP, we propose the Scalable Self-Supervised Universal Segmentation (S2-UniSeg), which employs a student and a momentum teacher for continuous pretraining. A novel segmentation-oriented pretext task, Query-wise Self-Distillation (QuerySD), is proposed to pretrain S2-UniSeg to learn the local-to-global correspondences. Under the same setting, S2-UniSeg outperforms the SOTA UnSAM model, achieving notable improvements of AP+6.9 on COCO, AR+11.1 on UVO, PixelAcc+4.5 on COCOStuff-27, RQ+8.0 on Cityscapes. After scaling up to a larger 2M-image subset of SA-1B, S2-UniSeg further achieves performance gains on all four benchmarks. Our code and pretrained models are available at https://github.com/bio-mlhui/S2-UniSeg
Problem

Research questions and friction points this paper is trying to address.

Eliminates time-consuming pseudo-mask generation in self-supervised segmentation
Enables scalable training without discontinuous optimization routines
Generates multi-granular segmentation masks within milliseconds per image
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fast Universal Agglomerative Pooling algorithm
Student-teacher continuous pretraining architecture
Query-wise Self-Distillation pretext task
🔎 Similar Papers
No similar papers found.
H
Huihui Xu
Shanghai Artificial Intelligence Laboratory
J
Jin Ye
Shanghai Artificial Intelligence Laboratory
Hongqiu Wang
Hongqiu Wang
Hong Kong University of Science and Technology (Guangzhou)
AI for healthcareLabel-efficient learningMulti-modal learningFairnessMLLM
C
Changkai Ji
Shanghai Artificial Intelligence Laboratory
J
Jiashi Lin
Shanghai Artificial Intelligence Laboratory
M
Ming Hu
Shanghai Artificial Intelligence Laboratory
Z
Ziyan Huang
Shanghai Artificial Intelligence Laboratory
Y
Ying Chen
Shanghai Artificial Intelligence Laboratory
Chenglong Ma
Chenglong Ma
Fudan University; Shanghai Innovation Institute
multi-modal modelsgenerative modelsmedical image analysis
Tianbin Li
Tianbin Li
Shanghai Artificial Intelligence Laboratory
Machine LearningComputer VisionGeneral Intelligence
Lihao Liu
Lihao Liu
Amazon
LLM-based AgentHealthcare AI
Junjun He
Junjun He
Shanghai Jiao Tong University
L
Lei Zhu
The Hong Kong University of Science and Technology