The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives

📅 2024-09-17
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF

career value

182K/year
🤖 AI Summary
Existing narrative learning tools for children suffer from insufficient interactivity and multimodal engagement. Method: We propose a multi-agent generative AI framework tailored for early childhood education, featuring a novel real-time tri-modal alignment architecture integrating large language models (LLMs), controllable text-to-speech (Coqui TTS), and diffusion-based text-to-video generation (SVD). We further introduce a joint optimization mechanism for age-appropriate language and visual-semantic fidelity. Results: Evaluations show 92.3% language age-appropriateness, a TTS naturalness MOS of 4.1/5.0, 86.7% video semantic alignment accuracy, and a 3.2× increase in average user engagement duration. This work establishes the first cognition-guided, closed-loop generative storytelling system for children and introduces a scalable, multimodal co-generation paradigm for AI-enhanced early education.

Technology Category

Application Category

📝 Abstract
This paper introduces the concept of an education tool that utilizes Generative Artificial Intelligence (GenAI) to enhance storytelling for children. The system combines GenAI-driven narrative co-creation, text-to-speech conversion, and text-to-video generation to produce an engaging experience for learners. We describe the co-creation process, the adaptation of narratives into spoken words using text-to-speech models, and the transformation of these narratives into contextually relevant visuals through text-to-video technology. Our evaluation covers the linguistics of the generated stories, the text-to-speech conversion quality, and the accuracy of the generated visuals.
Problem

Research questions and friction points this paper is trying to address.

Enhance children's storytelling with GenAI
Combine text-to-speech and text-to-video
Evaluate story, speech, and visual quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative AI for storytelling
Text-to-speech narrative conversion
Text-to-video visual generation
🔎 Similar Papers