CLIPSym: Delving into Symmetry Detection with CLIP

📅 2025-08-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the low detection accuracy and poor generalization of rotational and reflectional symmetries in natural images. Methodologically, it proposes a CLIP-based semantic-guided equivariant detection framework: (1) Semantic-Aware Prompt Grouping (SAPG) fuses multi-granularity object-level textual prompts to enhance symmetry-aware semantic modeling; (2) a hybrid Transformer-G-Conv equivariant decoder explicitly encodes rotational and reflectional equivariance in feature space. Crucially, the approach synergistically integrates CLIP’s cross-modal priors with group-equivariant architectural constraints to jointly reason about linguistic semantics and geometric symmetries. The framework achieves state-of-the-art performance on three benchmarks—DENDI, SDRW, and LDRS. Ablation studies confirm the essential contributions of CLIP pretraining, the equivariant decoder, and SAPG.

Technology Category

Application Category

📝 Abstract
Symmetry is one of the most fundamental geometric cues in computer vision, and detecting it has been an ongoing challenge. With the recent advances in vision-language models,~i.e., CLIP, we investigate whether a pre-trained CLIP model can aid symmetry detection by leveraging the additional symmetry cues found in the natural image descriptions. We propose CLIPSym, which leverages CLIP's image and language encoders and a rotation-equivariant decoder based on a hybrid of Transformer and $G$-Convolution to detect rotation and reflection symmetries. To fully utilize CLIP's language encoder, we have developed a novel prompting technique called Semantic-Aware Prompt Grouping (SAPG), which aggregates a diverse set of frequent object-based prompts to better integrate the semantic cues for symmetry detection. Empirically, we show that CLIPSym outperforms the current state-of-the-art on three standard symmetry detection datasets (DENDI, SDRW, and LDRS). Finally, we conduct detailed ablations verifying the benefits of CLIP's pre-training, the proposed equivariant decoder, and the SAPG technique. The code is available at https://github.com/timyoung2333/CLIPSym.
Problem

Research questions and friction points this paper is trying to address.

Detecting symmetry in images using CLIP model
Leveraging vision-language cues for symmetry detection
Improving rotation and reflection symmetry recognition accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages CLIP's encoders for symmetry detection
Uses rotation-equivariant decoder with Transformer and $G$-Convolution
Introduces Semantic-Aware Prompt Grouping technique
🔎 Similar Papers
No similar papers found.