Beyond Visual Cues: Leveraging General Semantics as Support for Few-Shot Segmentation

📅 2025-11-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional few-shot segmentation (FSS) suffers from weak meta-knowledge generalization due to intra-class visual variability, heavily relying on limited and biased support image features. Method: This paper proposes a language-driven, bias-free semantic generalization paradigm that replaces visual support features with fine-grained, attribute-rich textual descriptions—generated by large language models—as semantic priors. We design a multi-attribute enhancement module and a cross-modal alignment mechanism to jointly optimize and deeply fuse textual semantics with visual features. Contribution/Results: The framework substantially alleviates dependence on scarce, visually biased support samples. It achieves state-of-the-art performance on standard benchmarks including PASCAL-5i and COCO-20i, and—critically—provides the first systematic empirical validation of the effectiveness and scalability of generic semantic priors in few-shot segmentation.

Technology Category

Application Category

📝 Abstract
Few-shot segmentation (FSS) aims to segment novel classes under the guidance of limited support samples by a meta-learning paradigm. Existing methods mainly mine references from support images as meta guidance. However, due to intra-class variations among visual representations, the meta information extracted from support images cannot produce accurate guidance to segment untrained classes. In this paper, we argue that the references from support images may not be essential, the key to the support role is to provide unbiased meta guidance for both trained and untrained classes. We then introduce a Language-Driven Attribute Generalization (LDAG) architecture to utilize inherent target property language descriptions to build robust support strategy. Specifically, to obtain an unbiased support representation, we design a Multi-attribute Enhancement (MaE) module, which produces multiple detailed attribute descriptions of the target class through Large Language Models (LLMs), and then builds refined visual-text prior guidance utilizing multi-modal matching. Meanwhile, due to text-vision modal shift, attribute text struggles to promote visual feature representation, we design a Multi-modal Attribute Alignment (MaA) to achieve cross-modal interaction between attribute texts and visual feature. Experiments show that our proposed method outperforms existing approaches by a clear margin and achieves the new state-of-the art performance. The code will be released.
Problem

Research questions and friction points this paper is trying to address.

Addresses inaccurate guidance in few-shot segmentation due to visual variations
Utilizes language descriptions to build robust support strategy for segmentation
Solves cross-modal alignment between attribute texts and visual features
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using language descriptions for robust segmentation support
Multi-attribute Enhancement module generates detailed class attributes
Multi-modal Alignment bridges text and visual feature interaction
🔎 Similar Papers
No similar papers found.
J
Jin Wang
School of Control Science and Engineering, China University of Petroleum (East China), 66 West Changjiang Road, Qingdao, 266580, Shandong, China.
B
Bingfeng Zhang
School of Control Science and Engineering, China University of Petroleum (East China), 66 West Changjiang Road, Qingdao, 266580, Shandong, China.
J
Jian Pang
Geely Automobile Research Institute, Zhejiang Geely Holding Group Co., Ltd., 818 Binhai 2nd Road, Ningbo, 315000, Zhejiang, China.
M
Mengyu Liu
School of Control Science and Engineering, China University of Petroleum (East China), 66 West Changjiang Road, Qingdao, 266580, Shandong, China.
H
Honglong Chen
School of Control Science and Engineering, China University of Petroleum (East China), 66 West Changjiang Road, Qingdao, 266580, Shandong, China.
Weifeng Liu
Weifeng Liu
University of Florida
Machine LearningSignal ProcessingKernel adaptive filtering