Animalbooth: multimodal feature enhancement for animal subject personalization

๐Ÿ“… 2025-09-20
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Personalized animal image generation suffers from severe identity drift due to high inter-species appearance diversity and large anatomical variations, primarily caused by cross-domain feature misalignment. To address this, we propose AnimalBooth: (1) a lightweight AnimalNet backbone integrated with an adaptive attention module to enforce cross-modal identity feature alignment; (2) a discrete cosine transform (DCT)-based frequency-domain feature fusion mechanism enabling progressive generationโ€”from global structure to fine-grained texture; and (3) a diffusion-based generative framework incorporating multimodal feature fusion and latent-space modulation. We train and evaluate the model on AnimalBench, a newly curated high-quality animal image dataset. Experiments demonstrate that AnimalBooth achieves state-of-the-art performance in both identity fidelity and visual quality. Moreover, AnimalBench establishes a valuable benchmark for future research in personalized animal image generation.

Technology Category

Application Category

๐Ÿ“ Abstract
Personalized animal image generation is challenging due to rich appearance cues and large morphological variability. Existing approaches often exhibit feature misalignment across domains, which leads to identity drift. We present AnimalBooth, a framework that strengthens identity preservation with an Animal Net and an adaptive attention module, mitigating cross domain alignment errors. We further introduce a frequency controlled feature integration module that applies Discrete Cosine Transform filtering in the latent space to guide the diffusion process, enabling a coarse to fine progression from global structure to detailed texture. To advance research in this area, we curate AnimalBench, a high resolution dataset for animal personalization. Extensive experiments show that AnimalBooth consistently outperforms strong baselines on multiple benchmarks and improves both identity fidelity and perceptual quality.
Problem

Research questions and friction points this paper is trying to address.

Addressing identity drift in personalized animal image generation
Mitigating cross-domain feature misalignment for animal subjects
Enhancing identity preservation with multimodal feature enhancement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Animal Net and adaptive attention module for identity preservation
Frequency controlled feature integration using DCT filtering
Coarse-to-fine progression from structure to texture
๐Ÿ”Ž Similar Papers
No similar papers found.