Animalbooth: multimodal feature enhancement for animal subject personalization

📅 2025-09-20

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Personalized animal image generation suffers from severe identity drift due to high inter-species appearance diversity and large anatomical variations, primarily caused by cross-domain feature misalignment. To address this, we propose AnimalBooth: (1) a lightweight AnimalNet backbone integrated with an adaptive attention module to enforce cross-modal identity feature alignment; (2) a discrete cosine transform (DCT)-based frequency-domain feature fusion mechanism enabling progressive generation—from global structure to fine-grained texture; and (3) a diffusion-based generative framework incorporating multimodal feature fusion and latent-space modulation. We train and evaluate the model on AnimalBench, a newly curated high-quality animal image dataset. Experiments demonstrate that AnimalBooth achieves state-of-the-art performance in both identity fidelity and visual quality. Moreover, AnimalBench establishes a valuable benchmark for future research in personalized animal image generation.

Technology Category

Application Category

📝 Abstract

Personalized animal image generation is challenging due to rich appearance cues and large morphological variability. Existing approaches often exhibit feature misalignment across domains, which leads to identity drift. We present AnimalBooth, a framework that strengthens identity preservation with an Animal Net and an adaptive attention module, mitigating cross domain alignment errors. We further introduce a frequency controlled feature integration module that applies Discrete Cosine Transform filtering in the latent space to guide the diffusion process, enabling a coarse to fine progression from global structure to detailed texture. To advance research in this area, we curate AnimalBench, a high resolution dataset for animal personalization. Extensive experiments show that AnimalBooth consistently outperforms strong baselines on multiple benchmarks and improves both identity fidelity and perceptual quality.

Problem

Research questions and friction points this paper is trying to address.

Addressing identity drift in personalized animal image generation

Mitigating cross-domain feature misalignment for animal subjects

Enhancing identity preservation with multimodal feature enhancement

Innovation

Methods, ideas, or system contributions that make the work stand out.

Animal Net and adaptive attention module for identity preservation

Frequency controlled feature integration using DCT filtering

Coarse-to-fine progression from structure to texture

🔎 Similar Papers

No similar papers found.