AssetDropper: Asset Extraction via Diffusion Models with Reward-Driven Optimization

📅 2025-06-06

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Designers urgently require efficient extraction of standardized, front-facing, and reusable design assets from open-scene images; however, existing generative models struggle to simultaneously ensure high fidelity, orthogonality (i.e., canonical viewpoint alignment), and robustness—particularly under occlusion and complex viewpoints. This paper introduces the first generative framework specifically tailored for design asset extraction. It innovatively proposes an inverse-paste mechanism to construct a reward model, enabling closed-loop reinforcement optimization that substantially mitigates hallucination and improves prompt adherence. Built upon a diffusion architecture, the method leverages over 200K synthetic image–subject pairs for pretraining, is rigorously evaluated on real-world benchmarks, and undergoes closed-loop reinforcement fine-tuning. Experiments demonstrate state-of-the-art performance in design asset extraction, yielding high-fidelity, orthogonally aligned, and editable outputs. The framework has been successfully validated within real-world design workflows.

Technology Category

Application Category

📝 Abstract

Recent research on generative models has primarily focused on creating product-ready visual outputs; however, designers often favor access to standardized asset libraries, a domain that has yet to be significantly enhanced by generative capabilities. Although open-world scenes provide ample raw materials for designers, efficiently extracting high-quality, standardized assets remains a challenge. To address this, we introduce AssetDropper, the first framework designed to extract assets from reference images, providing artists with an open-world asset palette. Our model adeptly extracts a front view of selected subjects from input images, effectively handling complex scenarios such as perspective distortion and subject occlusion. We establish a synthetic dataset of more than 200,000 image-subject pairs and a real-world benchmark with thousands more for evaluation, facilitating the exploration of future research in downstream tasks. Furthermore, to ensure precise asset extraction that aligns well with the image prompts, we employ a pre-trained reward model to fulfill a closed-loop with feedback. We design the reward model to perform an inverse task that pastes the extracted assets back into the reference sources, which assists training with additional consistency and mitigates hallucination. Extensive experiments show that, with the aid of reward-driven optimization, AssetDropper achieves the state-of-the-art results in asset extraction. Project page: AssetDropper.github.io.

Problem

Research questions and friction points this paper is trying to address.

Extracting standardized assets from open-world scenes efficiently

Handling perspective distortion and occlusion in asset extraction

Ensuring precise asset alignment with image prompts via feedback

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses diffusion models for asset extraction

Employs reward-driven optimization for precision

Creates synthetic dataset for training evaluation

🔎 Similar Papers

No similar papers found.