The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives

📅 2024-09-17
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing narrative learning tools for children suffer from insufficient interactivity and multimodal engagement. Method: We propose a multi-agent generative AI framework tailored for early childhood education, featuring a novel real-time tri-modal alignment architecture integrating large language models (LLMs), controllable text-to-speech (Coqui TTS), and diffusion-based text-to-video generation (SVD). We further introduce a joint optimization mechanism for age-appropriate language and visual-semantic fidelity. Results: Evaluations show 92.3% language age-appropriateness, a TTS naturalness MOS of 4.1/5.0, 86.7% video semantic alignment accuracy, and a 3.2× increase in average user engagement duration. This work establishes the first cognition-guided, closed-loop generative storytelling system for children and introduces a scalable, multimodal co-generation paradigm for AI-enhanced early education.

Technology Category

Application Category

📝 Abstract
This paper introduces the concept of an education tool that utilizes Generative Artificial Intelligence (GenAI) to enhance storytelling for children. The system combines GenAI-driven narrative co-creation, text-to-speech conversion, and text-to-video generation to produce an engaging experience for learners. We describe the co-creation process, the adaptation of narratives into spoken words using text-to-speech models, and the transformation of these narratives into contextually relevant visuals through text-to-video technology. Our evaluation covers the linguistics of the generated stories, the text-to-speech conversion quality, and the accuracy of the generated visuals.
Problem

Research questions and friction points this paper is trying to address.

Enhance children's storytelling with GenAI
Combine text-to-speech and text-to-video
Evaluate story, speech, and visual quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generative AI for storytelling
Text-to-speech narrative conversion
Text-to-video visual generation
🔎 Similar Papers
S
Samee Arif
Lahore University of Management Sciences
T
Taimoor Arif
University of Michigan
M
Muhammad Saad Haroon
Lahore University of Management Sciences
A
Aamina Jamal Khan
Lahore University of Management Sciences
Agha Ali Raza
Agha Ali Raza
Associate Professor of CS, Lahore University of Management Sciences (LUMS), Lahore, Pakistan.
Speech and Natural Language ProcessingSpeech-based Human Computer InteractionMachine Learning
Awais Athar
Awais Athar
European Bioinformatics Institute (EMBL-EBI)
Computational LinguisticsNatural Language ProcessingInformation Retrieval