🤖 AI Summary
This paper addresses the low detection accuracy and poor generalization of rotational and reflectional symmetries in natural images. Methodologically, it proposes a CLIP-based semantic-guided equivariant detection framework: (1) Semantic-Aware Prompt Grouping (SAPG) fuses multi-granularity object-level textual prompts to enhance symmetry-aware semantic modeling; (2) a hybrid Transformer-G-Conv equivariant decoder explicitly encodes rotational and reflectional equivariance in feature space. Crucially, the approach synergistically integrates CLIP’s cross-modal priors with group-equivariant architectural constraints to jointly reason about linguistic semantics and geometric symmetries. The framework achieves state-of-the-art performance on three benchmarks—DENDI, SDRW, and LDRS. Ablation studies confirm the essential contributions of CLIP pretraining, the equivariant decoder, and SAPG.
📝 Abstract
Symmetry is one of the most fundamental geometric cues in computer vision, and detecting it has been an ongoing challenge. With the recent advances in vision-language models,~i.e., CLIP, we investigate whether a pre-trained CLIP model can aid symmetry detection by leveraging the additional symmetry cues found in the natural image descriptions. We propose CLIPSym, which leverages CLIP's image and language encoders and a rotation-equivariant decoder based on a hybrid of Transformer and $G$-Convolution to detect rotation and reflection symmetries. To fully utilize CLIP's language encoder, we have developed a novel prompting technique called Semantic-Aware Prompt Grouping (SAPG), which aggregates a diverse set of frequent object-based prompts to better integrate the semantic cues for symmetry detection. Empirically, we show that CLIPSym outperforms the current state-of-the-art on three standard symmetry detection datasets (DENDI, SDRW, and LDRS). Finally, we conduct detailed ablations verifying the benefits of CLIP's pre-training, the proposed equivariant decoder, and the SAPG technique. The code is available at https://github.com/timyoung2333/CLIPSym.