🤖 AI Summary
This work addresses the fundamental trade-off between visual exaggeration and semantic fidelity in stylized abstract image generation—particularly for out-of-distribution identities, identity preservation, and cross-style generalization. We propose the first training-free framework: (1) leveraging vision-language model (VLM) inference-time feature scaling to extract robust identity representations; (2) designing a cross-domain correction flow inversion for zero-shot transfer to highly abstract styles (e.g., LEGO, knitted dolls); (3) introducing style-aware temporal scheduling and dynamic structural restoration; and (4) constructing StyleBench, a GPT-driven benchmark for abstract quality evaluation. Our method enables multi-round, controllable abstraction from a single input image and releases a fully open-sourced stack. Experiments demonstrate significant improvements over state-of-the-art methods in identity recognizability, stylistic diversity, and abstraction plausibility—especially in scenarios where pixel-level metrics fail.
📝 Abstract
Stylized abstraction synthesizes visually exaggerated yet semantically faithful representations of subjects, balancing recognizability with perceptual distortion. Unlike image-to-image translation, which prioritizes structural fidelity, stylized abstraction demands selective retention of identity cues while embracing stylistic divergence, especially challenging for out-of-distribution individuals. We propose a training-free framework that generates stylized abstractions from a single image using inference-time scaling in vision-language models (VLLMs) to extract identity-relevant features, and a novel cross-domain rectified flow inversion strategy that reconstructs structure based on style-dependent priors. Our method adapts structural restoration dynamically through style-aware temporal scheduling, enabling high-fidelity reconstructions that honor both subject and style. It supports multi-round abstraction-aware generation without fine-tuning. To evaluate this task, we introduce StyleBench, a GPT-based human-aligned metric suited for abstract styles where pixel-level similarity fails. Experiments across diverse abstraction (e.g., LEGO, knitted dolls, South Park) show strong generalization to unseen identities and styles in a fully open-source setup.