CLIPDrawX: Primitive-based Explanations for Text Guided Sketch Synthesis

📅 2023-12-04
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the interpretability of CLIP text embeddings by proposing a minimalist, geometry-driven visualization method. It synthesizes semantically aligned sketches using only elementary geometric primitives—lines and circles—and their affine transformations (translation, scaling, rotation), optimized end-to-end via differentiable rendering and gradient-based refinement with the CLIP text encoder. Unlike prior approaches relying on high-order Bézier curves, our method uniquely constrains CLIP semantics to a linearly solvable geometric parameter space. This enables concept decomposition, traceable optimization trajectories, and structural attribution of generated sketches. Experiments demonstrate that the method produces topologically simple, semantically faithful sketches across diverse text prompts, significantly enhancing the visual interpretability of text embeddings and improving transparency into CLIP’s internal behavior.
📝 Abstract
With the goal of understanding the visual concepts that CLIP associates with text prompts, we show that the latent space of CLIP can be visualized solely in terms of linear transformations on simple geometric primitives like circles and straight lines. Although existing approaches achieve this by sketch-synthesis-through-optimization, they do so on the space of B'ezier curves, which exhibit a wastefully large set of structures that they can evolve into, as most of them are non-essential for generating meaningful sketches. We present CLIPDrawX, an algorithm that provides significantly better visualizations for CLIP text embeddings, using only simple primitive shapes like straight lines and circles. This constrains the set of possible outputs to linear transformations on these primitives, thereby exhibiting an inherently simpler mathematical form. The synthesis process of CLIPDrawX can be tracked end-to-end, with each visual concept being explained exclusively in terms of primitives. Implementation will be released upon acceptance. Project Page: $href{https://clipdrawx.github.io/}{ ext{https://clipdrawx.github.io/}}$.
Problem

Research questions and friction points this paper is trying to address.

Visualizing CLIP text embeddings with simple primitives
Reducing complexity by using lines and circles only
Tracking end-to-end sketch synthesis for clarity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses simple geometric primitives like lines and circles
Constrains outputs to linear transformations
Enables end-to-end synthesis tracking
🔎 Similar Papers
No similar papers found.