Generative Fields: Uncovering Hierarchical Feature Control for StyleGAN via Inverted Receptive Fields

📅 2025-04-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
StyleGAN generates high-fidelity faces but suffers from strong entanglement in its W-space, hindering fine-grained, disentangled semantic editing. To address this, we propose the “Generative Field” theory—first introducing a reverse receptive field concept into generative models by inverting the CNN receptive field paradigm, thereby characterizing the causal influence of each network layer on specific output image regions. Instead of fine-tuning W-space, we construct a channel-level interpretable latent space S, inherently aligned with StyleGAN’s architecture to enable native feature localization and modulation. Our method requires no pretraining, avoids reliance on external image or text encoders, and supports real-time, semantics-aware editing. Experiments demonstrate substantial improvements in both disentanglement and fidelity for facial attribute manipulation and part-level layout control.

Technology Category

Application Category

📝 Abstract
StyleGAN has demonstrated the ability of GANs to synthesize highly-realistic faces of imaginary people from random noise. One limitation of GAN-based image generation is the difficulty of controlling the features of the generated image, due to the strong entanglement of the low-dimensional latent space. Previous work that aimed to control StyleGAN with image or text prompts modulated sampling in W latent space, which is more expressive than Z latent space. However, W space still has restricted expressivity since it does not control the feature synthesis directly; also the feature embedding in W space requires a pre-training process to reconstruct the style signal, limiting its application. This paper introduces the concept of"generative fields"to explain the hierarchical feature synthesis in StyleGAN, inspired by the receptive fields of convolution neural networks (CNNs). Additionally, we propose a new image editing pipeline for StyleGAN using generative field theory and the channel-wise style latent space S, utilizing the intrinsic structural feature of CNNs to achieve disentangled control of feature synthesis at synthesis time.
Problem

Research questions and friction points this paper is trying to address.

Control hierarchical features in StyleGAN generated images
Address limited expressivity of W latent space
Enable disentangled feature control via generative fields
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces generative fields for hierarchical feature control
Utilizes channel-wise style latent space S
Proposes new image editing pipeline for StyleGAN