🤖 AI Summary
To address the challenge of balancing image quality and inference efficiency in few-step text-to-image diffusion models for image inpainting, this paper proposes TurboFill: a lightweight mask-aware inpainting adapter built upon the three-step distilled diffusion model DMD2, coupled with a novel three-stage adversarial training strategy. Methodologically, TurboFill integrates few-step diffusion distillation, mask-conditioned modeling, and adapter-based fine-tuning to significantly reduce computational overhead. Our contributions are threefold: (1) We introduce two new benchmarks—DilationBench and HumanBench—designed to reflect real-world inpainting requirements and emphasize human visual alignment in evaluation; (2) TurboFill achieves state-of-the-art trade-offs between speed and quality, outperforming multi-step BrushNet and existing few-step methods across diverse mask scales and complex text prompts. Experimental results demonstrate substantial improvements in both fidelity and efficiency, establishing a new benchmark for practical, high-fidelity few-step inpainting.
📝 Abstract
This paper introduces TurboFill, a fast image inpainting model that enhances a few-step text-to-image diffusion model with an inpainting adapter for high-quality and efficient inpainting. While standard diffusion models generate high-quality results, they incur high computational costs. We overcome this by training an inpainting adapter on a few-step distilled text-to-image model, DMD2, using a novel 3-step adversarial training scheme to ensure realistic, structurally consistent, and visually harmonious inpainted regions. To evaluate TurboFill, we propose two benchmarks: DilationBench, which tests performance across mask sizes, and HumanBench, based on human feedback for complex prompts. Experiments show that TurboFill outperforms both multi-step BrushNet and few-step inpainting methods, setting a new benchmark for high-performance inpainting tasks. Our project page: https://liangbinxie.github.io/projects/TurboFill/