🤖 AI Summary
To address the challenge of efficiently deploying high-resolution (4K) diffusion-based image editing on resource-constrained devices—particularly mobile platforms—this paper proposes a three-stage lightweight editing framework. Our method introduces: (1) a hallucination-aware loss function that explicitly suppresses generation artifacts; (2) latent-space projection coupled with an adaptive context-preserving tiling strategy to jointly maintain local fidelity and global coherence; and (3) a tiled upsampling mechanism that substantially reduces GPU memory consumption and computational overhead. Experiments demonstrate that our approach preserves editing quality while achieving 18–48% improvement in PSNR, 14–51% reduction in hallucination artifacts, and a 55.8× speedup over the A100-based baseline. To our knowledge, this is the first method enabling real-time, high-quality 4K image editing on mobile devices.
📝 Abstract
High-resolution (4K) image-to-image synthesis has become increasingly important for mobile applications. Existing diffusion models for image editing face significant challenges, in terms of memory and image quality, when deployed on resource-constrained devices. In this paper, we present MobilePicasso, a novel system that enables efficient image editing at high resolutions, while minimising computational cost and memory usage. MobilePicasso comprises three stages: (i) performing image editing at a standard resolution with hallucination-aware loss, (ii) applying latent projection to overcome going to the pixel space, and (iii) upscaling the edited image latent to a higher resolution with adaptive context-preserving tiling. Our user study with 46 participants reveals that MobilePicasso not only improves image quality by 18-48% but reduces hallucinations by 14-51% over existing methods. MobilePicasso demonstrates significantly lower latency, e.g., up to 55.8$ imes$ speed-up, yet with a small increase in runtime memory, e.g., a mere 9% increase over prior work. Surprisingly, the on-device runtime of MobilePicasso is observed to be faster than a server-based high-resolution image editing model running on an A100 GPU.