🤖 AI Summary
Few-shot image generation suffers from alignment distortion due to structural distribution mismatch between source and target domains: conventional latent-space consistency constraints—either instance-level or distribution-level—are prone to either content distortion (if overly strict) or weakened transfer efficacy (if too loose), further exacerbated by distribution estimation bias under extreme target-sample scarcity. This paper proposes a two-level domain alignment framework that achieves structure-preserving cross-domain alignment within a self-rotating proxy feature space. Its core innovation is the Equivariant Feature Rotation (EFR) mechanism: learning differentiable rotation matrices within a parameterized Lie group to ensure equivariance and lossless feature transformation. Integrated with proxy-space construction, few-shot fine-tuning, and joint feature alignment optimization, the method reduces FID by over 25% across multiple benchmarks, significantly improving generation quality and diversity while mitigating content distortion and distributional bias.
📝 Abstract
Few-shot image generation aims to effectively adapt a source generative model to a target domain using very few training images. Most existing approaches introduce consistency constraints-typically through instance-level or distribution-level loss functions-to directly align the distribution patterns of source and target domains within their respective latent spaces. However, these strategies often fall short: overly strict constraints can amplify the negative effects of the domain gap, leading to distorted or uninformative content, while overly relaxed constraints may fail to leverage the source domain effectively. This limitation primarily stems from the inherent discrepancy in the underlying distribution structures of the source and target domains. The scarcity of target samples further compounds this issue by hindering accurate estimation of the target domain's distribution. To overcome these limitations, we propose Equivariant Feature Rotation (EFR), a novel adaptation strategy that aligns source and target domains at two complementary levels within a self-rotated proxy feature space. Specifically, we perform adaptive rotations within a parameterized Lie Group to transform both source and target features into an equivariant proxy space, where alignment is conducted. These learnable rotation matrices serve to bridge the domain gap by preserving intra-domain structural information without distortion, while the alignment optimization facilitates effective knowledge transfer from the source to the target domain. Comprehensive experiments on a variety of commonly used datasets demonstrate that our method significantly enhances the generative performance within the targeted domain.