🤖 AI Summary
Existing diffusion models face significant challenges in deploying general-purpose image editing on mobile devices due to their large parameter counts and reliance on text-based guidance, resulting in high computational costs and privacy concerns. This work proposes a lightweight, text-free, multi-task image-to-image diffusion model that unifies diverse editing tasks within a single compact architecture for the first time. With only 195 million parameters, the model achieves a single inference latency of 290 ms on a Pixel 10 device, substantially reducing memory consumption and download size while maintaining high-quality generation. This advancement enables fast, privacy-preserving, on-device general image editing without compromising performance.
📝 Abstract
The remarkable generation quality of modern diffusion models often comes at the cost of massive parameter counts, which necessitate server-side inference with significant computational costs and potential privacy risks. Consequently, there is growing momentum toward developing efficient on-device alternatives. While recent efforts have optimized text-to-image models for mobile hardware, they remain relatively bulky, typically ranging from 0.5B to 1B parameters. We present BlazeEdit, a highly efficient, generalist image-to-image diffusion model tailored for on-device deployment. By identifying that many practical image editing tasks do not require text-based guidance, we eliminate the text-conditioning components and develop a multi-task architecture that consolidates object removal, outpainting, tone correction, relighting, and sticker generation into a single, compact model of only 195M parameters. BlazeEdit achieves a substantial reduction in download size and memory overhead while maintaining competitive generation quality. It completes a full inference pass in just 290ms on a Pixel 10, delivering a seamless, privacy-preserving, and lightning-fast experience for generalist image editing on the edge.