🤖 AI Summary
Traditional animation production relies heavily on labor-intensive manual keyframing, resulting in high technical barriers and low efficiency. This work proposes the first language-driven animation generation framework, leveraging a large language model (LLM) to interpret natural language semantics and integrating the Segment Anything Model (SAM) for visual grounding and scene geometry understanding. The framework automatically generates high-quality animations that respect perspective constraints, depth structure, and occlusion logic. It supports complex animation types such as contour-following motion, depth-aware camera trajectories, and perspective-aligned transformations. Extensive evaluations across diverse scenes demonstrate the method’s feasibility and practicality, significantly lowering the technical threshold for animation creation.
📝 Abstract
Animation elevates digital documents into immersive experiences, yet creating custom motion paths remains cumbersome, requiring designers to manually select presets, plot Bézier points, and configure timing properties. We introduce Generative Animations, a system that transforms natural language prompts into production-ready animations. By chaining Large Language Models (LLMs) for semantic parsing with the Segment Anything Model (SAM) for visual grounding, our pipeline automatically generates motion paths that respect scene geometry, handle depth-based occlusions, and honor 3D perspective transforms. We demonstrate the system through three use cases: contour-following trajectories, orbital animations with z-order awareness, and perspective-aligned motion on transformed objects.