AniSora: Exploring the Frontiers of Animation Video Generation in the Sora Era

📅 2024-12-13
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF

career value

203K/year
🤖 AI Summary
Existing video generation models (e.g., Sora, Kling) excel on natural videos but struggle to capture animation-specific characteristics—including stylized aesthetics, non-physical motion, and exaggerated deformations—and lack dedicated evaluation protocols. To address this gap, we propose the first end-to-end framework tailored for animated video generation. Our method comprises: (1) a scalable pipeline processing millions of animation frames; (2) a novel spatiotemporal masking module enabling unified support for image-to-video synthesis, frame interpolation, and localized conditional generation; and (3) AnimEval, the first animation-specific benchmark comprising 948 diverse animated clips, along with robust evaluation metrics that jointly assess stylistic fidelity and physical plausibility. Extensive experiments demonstrate substantial improvements over general-purpose video diffusion models across multiple animation generation tasks. All code and data are publicly released to advance standardization in animation-oriented AIGC research.

Technology Category

Application Category

📝 Abstract
Animation has gained significant interest in the recent film and TV industry. Despite the success of advanced video generation models like Sora, Kling, and CogVideoX in generating natural videos, they lack the same effectiveness in handling animation videos. Evaluating animation video generation is also a great challenge due to its unique artist styles, violating the laws of physics and exaggerated motions. In this paper, we present a comprehensive system, AniSora, designed for animation video generation, which includes a data processing pipeline, a controllable generation model, and an evaluation benchmark. Supported by the data processing pipeline with over 10M high-quality data, the generation model incorporates a spatiotemporal mask module to facilitate key animation production functions such as image-to-video generation, frame interpolation, and localized image-guided animation. We also collect an evaluation benchmark of 948 various animation videos, with specifically developed metrics for animation video generation. Our entire project is publicly available on https://github.com/bilibili/Index-anisora/tree/main.
Problem

Research questions and friction points this paper is trying to address.

Addressing animation video generation limitations in advanced models
Overcoming challenges in evaluating unique animation styles and motions
Developing a comprehensive system for controllable animation generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Data processing pipeline with 10M high-quality data
Spatiotemporal mask module for key animation functions
Evaluation benchmark with 948 diverse animation videos
🔎 Similar Papers
No similar papers found.