🤖 AI Summary
This work addresses the challenge of learning and recombining fine-grained part-level concepts from a single image exemplar in text-to-image diffusion models. Methodologically: (1) it introduces a dynamic data synthesis pipeline to alleviate severe data scarcity inherent in single-example learning; (2) it proposes a novel mutual information maximization-based concept predictor, enabling explicit latent-space supervision for part disentanglement and recomposition; and (3) it incorporates structured concept encoding to ensure semantic consistency during cross-category part composition. Experiments demonstrate that the method significantly outperforms existing subject-level and part-level baselines in both part disentanglement fidelity and cross-category concept recombination capability. Notably, it achieves high-fidelity, controllable part-level generation under extreme few-shot conditions—requiring only one exemplar per target part—while preserving structural coherence and semantic plausibility across diverse object categories.
📝 Abstract
We present PartComposer: a framework for part-level concept learning from single-image examples that enables text-to-image diffusion models to compose novel objects from meaningful components. Existing methods either struggle with effectively learning fine-grained concepts or require a large dataset as input. We propose a dynamic data synthesis pipeline generating diverse part compositions to address one-shot data scarcity. Most importantly, we propose to maximize the mutual information between denoised latents and structured concept codes via a concept predictor, enabling direct regulation on concept disentanglement and re-composition supervision. Our method achieves strong disentanglement and controllable composition, outperforming subject and part-level baselines when mixing concepts from the same, or different, object categories.