🤖 AI Summary
This work addresses the fine-grained, font-aware text editing challenge in professional poster design—namely, precisely modifying localized text under multi-font stylistic constraints while preserving visual harmony and layout intent. We propose a diffusion-based framework that requires neither font labels nor test-time fine-tuning. Our method integrates a font-aware text rendering module with a multi-region collaborative editing mechanism, enabling arbitrary font control solely from user-provided glyph crop samples. Compared to existing image editing models, ours achieves significantly improved text fidelity and visual realism, attaining state-of-the-art performance across multiple benchmarks. Notably, it supports complex fonts—including handwritten styles—and, for the first time, enables professional-grade typographic consistency editing without access to font metadata or explicit font identification.
📝 Abstract
Artistic design such as poster design often demands rapid yet precise modification of textual content while preserving visual harmony and typographic intent, especially across diverse font styles. Although modern image editing models have grown increasingly powerful, they still fall short in fine-grained, font-aware text manipulation, limiting their utility in professional design workflows such as poster editing. To address this issue, we present SkyReels-Text, a novel font-controllable framework for precise poster text editing. Our method enables simultaneous editing of multiple text regions, each rendered in distinct typographic styles, while preserving the visual appearance of non-edited regions. Notably, our model requires neither font labels nor fine-tuning during inference: users can simply provide cropped glyph patches corresponding to their desired typography, even if the font is not included in any standard library. Extensive experiments on multiple datasets, including handwrittent text benchmarks, SkyReels-Text achieves state-of-the-art performance in both text fidelity and visual realism, offering unprecedented control over font families, and stylistic nuances. This work bridges the gap between general-purpose image editing and professional-grade typographic design.