Repulsor: Accelerating Generative Modeling with a Contrastive Memory Bank

📅 2025-12-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Denoising generative models (e.g., diffusion models) suffer from high training costs and inefficient representation learning. Existing discriminative alignment approaches rely on external pretrained encoders, incurring additional computational overhead and domain shift issues. This paper proposes the first encoder-free contrastive memory bank framework: it decouples negative sample size from batch size via a dynamically updated large-scale negative queue; integrates a low-dimensional projection head with the denoising objective to enable self-contained, zero-inference-overhead contrastive learning. The method significantly accelerates convergence—achieving FID=2.40 on ImageNet-256 within 400K steps—setting a new state-of-the-art at the time. It establishes a novel paradigm for efficient self-supervised representation learning in generative modeling, eliminating reliance on external architectures while maintaining end-to-end trainability and inference efficiency.

Technology Category

Application Category

📝 Abstract
The dominance of denoising generative models (e.g., diffusion, flow-matching) in visual synthesis is tempered by their substantial training costs and inefficiencies in representation learning. While injecting discriminative representations via auxiliary alignment has proven effective, this approach still faces key limitations: the reliance on external, pre-trained encoders introduces overhead and domain shift. A dispersed-based strategy that encourages strong separation among in-batch latent representations alleviates this specific dependency. To assess the effect of the number of negative samples in generative modeling, we propose {mname}, a plug-and-play training framework that requires no external encoders. Our method integrates a memory bank mechanism that maintains a large, dynamically updated queue of negative samples across training iterations. This decouples the number of negatives from the mini-batch size, providing abundant and high-quality negatives for a contrastive objective without a multiplicative increase in computational cost. A low-dimensional projection head is used to further minimize memory and bandwidth overhead. {mname} offers three principal advantages: (1) it is self-contained, eliminating dependency on pretrained vision foundation models and their associated forward-pass overhead; (2) it introduces no additional parameters or computational cost during inference; and (3) it enables substantially faster convergence, achieving superior generative quality more efficiently. On ImageNet-256, {mname} achieves a state-of-the-art FID of extbf{2.40} within 400k steps, significantly outperforming comparable methods.
Problem

Research questions and friction points this paper is trying to address.

Reduces reliance on external pre-trained encoders in generative models
Accelerates training convergence without increasing computational costs
Enhances generative quality using a contrastive memory bank mechanism
Innovation

Methods, ideas, or system contributions that make the work stand out.

Plug-and-play framework eliminates external encoder dependency
Memory bank provides abundant negatives without extra computation
Low-dimensional projection minimizes memory and bandwidth overhead