MG-Gen: Single Image to Motion Graphics Generation with Layer Decomposition

📅 2025-04-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing image-to-video methods struggle to generate high-fidelity motion graphics featuring active text animation and object deformation, while code-based vector animation approaches rely on manually annotated hierarchical vector structures, limiting applicability to single raster inputs. This paper introduces the first end-to-end framework that reconstructs semantic, layered HTML structure directly from a single raster image and synthesizes executable JavaScript animation code. Our method integrates raster image layer decomposition, HTML semantic reconstruction, cross-modal alignment—leveraging diffusion or generative models—and animation code synthesis, all without requiring manual vector layer annotations. Experiments demonstrate that our generated motion graphics significantly outperform general-purpose image-to-video models in text readability, structural fidelity, and motion plausibility. Crucially, outputs are deployable, editable frontend code—effectively bridging raster inputs and executable, vector-based motion graphics.

Technology Category

Application Category

📝 Abstract
General image-to-video generation methods often produce suboptimal animations that do not meet the requirements of animated graphics, as they lack active text motion and exhibit object distortion. Also, code-based animation generation methods typically require layer-structured vector data which are often not readily available for motion graphic generation. To address these challenges, we propose a novel framework named MG-Gen that reconstructs data in vector format from a single raster image to extend the capabilities of code-based methods to enable motion graphics generation from a raster image in the framework of general image-to-video generation. MG-Gen first decomposes the input image into layer-wise elements, reconstructs them as HTML format data and then generates executable JavaScript code for the reconstructed HTML data. We experimentally confirm that ours{} generates motion graphics while preserving text readability and input consistency. These successful results indicate that combining layer decomposition and animation code generation is an effective strategy for motion graphics generation.
Problem

Research questions and friction points this paper is trying to address.

Generates motion graphics from single raster images
Addresses lack of text motion and object distortion
Converts raster images to vector data for animation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reconstructs vector data from raster image
Decomposes image into layer-wise elements
Generates executable JavaScript animation code
🔎 Similar Papers
No similar papers found.
Takahiro Shirakawa
Takahiro Shirakawa
CyberAgent, Inc.
AIComputer VisionImage and Video GenerationGANsDiffusion Models
T
Tomoyuki Suzuki
CyberAgent, Japan
D
Daichi Haraguchi
CyberAgent, Japan