🤖 AI Summary
This work addresses the challenges of inter-class conflict and intra-class drift in multi-class open-vocabulary semantic segmentation, which arise from inconsistent evidence scales across class-specific prompts and synonymous expressions. To mitigate these issues, the authors propose a decoupled inference framework that separates the process into two stages: intra-class enhancement followed by inter-class competition. First, evidence from synonymous prompts is aligned and aggregated to strengthen conceptual consistency; then, pixel-wise inter-class competition is performed under a unified evidence scale. This approach explicitly models the mechanism of conceptual conflict for the first time and improves the stability and accuracy of multi-class inference without requiring additional training. Evaluated within the SAM3 framework, the method consistently achieves performance gains across eight open-vocabulary segmentation benchmarks, effectively alleviating both inter-class conflict and intra-class drift.
📝 Abstract
SAM3 advances open-vocabulary semantic segmentation by introducing a prompt-driven mask generation paradigm. However, in multi-class open-vocabulary scenarios, masks generated independently from different category prompts lack a unified and inter-class comparable evidence scale, often resulting in overlapping coverage and unstable competition. Moreover, synonymous expressions of the same concept tend to activate inconsistent semantic and spatial evidence, leading to intra-class drift that exacerbates inter-class conflicts and compromises overall inference stability. To address these issues, we propose CoCo-SAM3 (Concept-Conflict SAM3), which explicitly decouples inference into intra-class enhancement and inter-class competition. Our method first aligns and aggregates evidence from synonymous prompts to strengthen concept consistency. It then performs inter-class competition on a unified comparable scale, enabling direct pixel-wise comparisons among all candidate classes. This mechanism stabilizes multi-class inference and effectively mitigates inter-class conflicts. Without requiring any additional training, CoCo-SAM3 achieves consistent improvements across eight open-vocabulary semantic segmentation benchmarks.