Training Free Stylized Abstraction

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This work addresses the fundamental trade-off between visual exaggeration and semantic fidelity in stylized abstract image generation—particularly for out-of-distribution identities, identity preservation, and cross-style generalization. We propose the first training-free framework: (1) leveraging vision-language model (VLM) inference-time feature scaling to extract robust identity representations; (2) designing a cross-domain correction flow inversion for zero-shot transfer to highly abstract styles (e.g., LEGO, knitted dolls); (3) introducing style-aware temporal scheduling and dynamic structural restoration; and (4) constructing StyleBench, a GPT-driven benchmark for abstract quality evaluation. Our method enables multi-round, controllable abstraction from a single input image and releases a fully open-sourced stack. Experiments demonstrate significant improvements over state-of-the-art methods in identity recognizability, stylistic diversity, and abstraction plausibility—especially in scenarios where pixel-level metrics fail.

Technology Category

Application Category

📝 Abstract

Stylized abstraction synthesizes visually exaggerated yet semantically faithful representations of subjects, balancing recognizability with perceptual distortion. Unlike image-to-image translation, which prioritizes structural fidelity, stylized abstraction demands selective retention of identity cues while embracing stylistic divergence, especially challenging for out-of-distribution individuals. We propose a training-free framework that generates stylized abstractions from a single image using inference-time scaling in vision-language models (VLLMs) to extract identity-relevant features, and a novel cross-domain rectified flow inversion strategy that reconstructs structure based on style-dependent priors. Our method adapts structural restoration dynamically through style-aware temporal scheduling, enabling high-fidelity reconstructions that honor both subject and style. It supports multi-round abstraction-aware generation without fine-tuning. To evaluate this task, we introduce StyleBench, a GPT-based human-aligned metric suited for abstract styles where pixel-level similarity fails. Experiments across diverse abstraction (e.g., LEGO, knitted dolls, South Park) show strong generalization to unseen identities and styles in a fully open-source setup.

Problem

Research questions and friction points this paper is trying to address.

Generates stylized abstractions without training from single images

Balances identity retention with stylistic divergence in abstractions

Evaluates abstract styles using human-aligned metrics effectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free framework using vision-language models

Cross-domain rectified flow inversion strategy

Style-aware temporal scheduling for dynamic restoration

🔎 Similar Papers

Have Large Vision-Language Models Mastered Art History?