🤖 AI Summary
This work addresses the challenge of balancing semantic fidelity and inference efficiency in controllable image generation. The authors propose an efficient diffusion-based editing framework grounded in Riemannian manifold modeling, wherein the latent space is treated as a Riemannian manifold whose geometric structure is learned using a Mamba architecture. The approach integrates dual SLERP geodesic interpolation, target-aware prompt enhancement, and task-specific attention pruning to achieve precise semantic control. Remarkably, the method significantly outperforms current state-of-the-art techniques while maintaining computational overhead below 50%, thereby striking an effective balance between high-fidelity semantic manipulation and real-time inference capabilities.
📝 Abstract
Controllable image generation is fundamental to the success of modern generative AI, yet it faces a critical trade-off between semantic fidelity and inference speed. The RemEdit diffusion-based framework addresses this trade-off with two synergistic innovations. First, for editing fidelity, we navigate the latent space as a Riemannian manifold. A mamba-based module efficiently learns the manifold's structure, enabling direct and accurate geodesic path computation for smooth semantic edits. This control is further refined by a dual-SLERP blending technique and a goal-aware prompt enrichment pass from a Vision-Language Model. Second, for additional acceleration, we introduce a novel task-specific attention pruning mechanism. A lightweight pruning head learns to retain tokens essential to the edit, enabling effective optimization without the semantic degradation common in content-agnostic approaches. RemEdit surpasses prior state-of-the-art editing frameworks while maintaining real-time performance under 50% pruning. Consequently, RemEdit establishes a new benchmark for practical and powerful image editing. Source code: https://www.github.com/eashanadhikarla/RemEdit.