Direct Ascent Synthesis: Revealing Hidden Generative Capabilities in Discriminative Models

📅 2025-02-11

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This work challenges the conventional dichotomy between discriminative and generative models, providing the first theoretical proof that standard discriminative models—such as CLIP—implicitly encode rich generative knowledge. To harness this, we propose Direct Ascent Synthesis (DAS), a training-free, multi-scale latent-space optimization method: it performs concurrent gradient ascent across resolution levels (1×1 to 224×224), integrating hierarchical latent inversion with natural-image 1/f² spectral priors. DAS enables zero-shot text-to-image generation and cross-domain style transfer without any model fine-tuning or adversarial training. Remarkably, it achieves image fidelity comparable to dedicated generative models while markedly suppressing artifacts and rigorously preserving human visual statistics—e.g., natural-scene spectral decay and structural coherence. By eliminating reliance on explicit generative architectures or adversarial objectives, DAS redefines the boundary between discriminative and generative modeling, demonstrating that high-fidelity synthesis can emerge directly from off-the-shelf discriminative representations.

Technology Category

Application Category

📝 Abstract

We demonstrate that discriminative models inherently contain powerful generative capabilities, challenging the fundamental distinction between discriminative and generative architectures. Our method, Direct Ascent Synthesis (DAS), reveals these latent capabilities through multi-resolution optimization of CLIP model representations. While traditional inversion attempts produce adversarial patterns, DAS achieves high-quality image synthesis by decomposing optimization across multiple spatial scales (1x1 to 224x224), requiring no additional training. This approach not only enables diverse applications -- from text-to-image generation to style transfer -- but maintains natural image statistics ($1/f^2$ spectrum) and guides the generation away from non-robust adversarial patterns. Our results demonstrate that standard discriminative models encode substantially richer generative knowledge than previously recognized, providing new perspectives on model interpretability and the relationship between adversarial examples and natural image synthesis.

Problem

Research questions and friction points this paper is trying to address.

Discriminative models' hidden generative capabilities

Multi-resolution optimization for image synthesis

Relationship between adversarial examples and natural images

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-resolution optimization CLIP

Direct Ascent Synthesis method

No additional training required

🔎 Similar Papers

No similar papers found.