Guiding Diffusion with Deep Geometric Moments: Balancing Fidelity and Variation

📅 2025-05-18

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Text-to-image diffusion models often sacrifice generation diversity when employing strong spatial guidance (e.g., segmentation or depth maps). This work addresses the problem of achieving fine-grained, subject-aware control without compromising diversity. We propose Depth Geometry Moments (DGM), a novel guidance signal that introduces robust high-order geometric moments into diffusion model conditioning—focusing on local geometric structure of the primary subject rather than global semantics or pixel-level details. Our method integrates learned geometric prior modeling with latent-space conditional guidance. Extensive evaluations across multiple benchmarks demonstrate that DGM significantly improves the trade-off between control accuracy and generation diversity: it enables flexible, stable, and subject-consistent synthesis while preserving image fidelity, thereby overcoming the diversity suppression bottleneck inherent to conventional spatial-map guidance.

Technology Category

Application Category

📝 Abstract

Text-to-image generation models have achieved remarkable capabilities in synthesizing images, but often struggle to provide fine-grained control over the output. Existing guidance approaches, such as segmentation maps and depth maps, introduce spatial rigidity that restricts the inherent diversity of diffusion models. In this work, we introduce Deep Geometric Moments (DGM) as a novel form of guidance that encapsulates the subject's visual features and nuances through a learned geometric prior. DGMs focus specifically on the subject itself compared to DINO or CLIP features, which suffer from overemphasis on global image features or semantics. Unlike ResNets, which are sensitive to pixel-wise perturbations, DGMs rely on robust geometric moments. Our experiments demonstrate that DGM effectively balance control and diversity in diffusion-based image generation, allowing a flexible control mechanism for steering the diffusion process.

Problem

Research questions and friction points this paper is trying to address.

Achieving fine-grained control in text-to-image generation

Overcoming spatial rigidity in existing guidance approaches

Balancing control and diversity in diffusion-based image synthesis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep Geometric Moments guide diffusion models

Learned geometric prior captures subject nuances

Balances control and diversity in image generation

🔎 Similar Papers

No similar papers found.