Negative-Guided Subject Fidelity Optimization for Zero-Shot Subject-Driven Generation

📅 2025-06-04

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

To address the insufficient subject fidelity and text-image alignment in zero-shot subject-driven image generation, this paper proposes a negative-sample-guided contrastive learning framework. The method operates entirely on pre-trained diffusion models without fine-tuning or additional supervision. Its core contributions are: (1) a Conditionally Degraded Negative Sampling (CDNS) strategy that automatically generates semantically relevant yet identity-mismatched negative samples—without manual annotation; and (2) a dynamic timestep reweighting mechanism that strengthens the diffusion model’s capacity to model subject-specific features during critical detail-generation timesteps. Evaluated across multiple subject-driven benchmarks, the approach achieves significant improvements in identity fidelity (+12.7%) and text-image alignment (+9.3%), marking the first zero-shot method to jointly optimize both metrics.

Technology Category

Application Category

📝 Abstract

We present Subject Fidelity Optimization (SFO), a novel comparative learning framework for zero-shot subject-driven generation that enhances subject fidelity. Beyond supervised fine-tuning methods that rely only on positive targets and use the diffusion loss as in the pre-training stage, SFO introduces synthetic negative targets and explicitly guides the model to favor positives over negatives through pairwise comparison. For negative targets, we propose Condition-Degradation Negative Sampling (CDNS), which automatically generates distinctive and informative negatives by intentionally degrading visual and textual cues without expensive human annotations. Moreover, we reweight the diffusion timesteps to focus finetuning on intermediate steps where subject details emerge. Extensive experiments demonstrate that SFO with CDNS significantly outperforms baselines in terms of both subject fidelity and text alignment on a subject-driven generation benchmark. Project page: https://subjectfidelityoptimization.github.io/

Problem

Research questions and friction points this paper is trying to address.

Enhancing subject fidelity in zero-shot generation

Generating synthetic negatives without human annotations

Improving text alignment in subject-driven generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Comparative learning framework enhances subject fidelity

Condition-Degradation Negative Sampling generates synthetic negatives

Reweighted diffusion timesteps focus on intermediate steps

🔎 Similar Papers

EZIGen: Enhancing zero-shot personalized image generation with precise subject encoding and decoupled guidance