🤖 AI Summary
This paper addresses the long-overlooked problem of onomatopoeia visual stylization in manga generation. We propose OnomatoGen, the first end-to-end model specifically designed for onomatopoeia generation and stylistic control. Unlike generic text style transfer methods, OnomatoGen explicitly models semantic-visual alignment between onomatopoeic text and scene emotion intensity by leveraging the alpha channel to govern shape, scale, and spatial placement of onomatopoeic glyphs. The framework integrates three key components: (1) fine-grained textual semantic modeling, (2) cross-modal visual–semantic alignment, and (3) alpha-channel–driven controllable image generation. Quantitative and qualitative evaluations demonstrate that OnomatoGen significantly outperforms existing approaches in both visual expressiveness and contextual coherence of generated onomatopoeia. It establishes a novel paradigm for AI-driven manga synthesis, enabling semantically grounded, stylistically adaptive, and compositionally aware onomatopoeia rendering.
📝 Abstract
Onomatopoeia is an important element for textual messaging in manga. Unlike character dialogue in manga, onomatopoeic expressions are visually stylized, with variations in shape, size, and placement that reflect the scene's intensity and mood. Despite its important role, onomatopoeia has not received much attention in manga generation. In this paper, we focus on onomatopoeia generation and propose OnomatoGen, which stylizes plain text to an onomatopoeic style. We empirically show the unique properties of onomatopoeia generation, which differ from typical text stylization methods, and that OnomatoGen can effectively stylize plain text in an onomatopoeia style.