Story Generation from Visual Inputs: Techniques, Related Tasks, and Challenges

📅 2024-06-04

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This survey addresses the critical multimodal task of vision-driven story generation, systematically reviewing representative works from 2015 to 2024. Motivated by three key problems—lack of a unified analytical framework, ambiguous task boundaries, and outdated evaluation protocols—we propose, for the first time, a cross-task unifying framework encompassing image/video captioning, visual question answering (VQA), and story generation, clarifying methodological transfer patterns and fundamental distinctions. We critically examine prevalent datasets and metrics (e.g., BLEU, CIDEr, SPICE), exposing their limitations in capturing explainability, controllability, and long-range narrative coherence, and advocate for evaluation reforms aligned with these dimensions. Synthesizing advances in deep learning, multimodal alignment (e.g., CLIP-style architectures), and sequence modeling (e.g., Transformers), we identify current bottlenecks and chart a roadmap toward robust, trustworthy visual storytelling—providing both theoretical foundations and practical guidance for next-generation research.

Technology Category

Application Category

📝 Abstract

Creating engaging narratives from visual data is crucial for automated digital media consumption, assistive technologies, and interactive entertainment. This survey covers methodologies used in the generation of these narratives, focusing on their principles, strengths, and limitations. The survey also covers tasks related to automatic story generation, such as image and video captioning, and visual question answering, as well as story generation without visual inputs. These tasks share common challenges with visual story generation and have served as inspiration for the techniques used in the field. We analyze the main datasets and evaluation metrics, providing a critical perspective on their limitations.

Problem

Research questions and friction points this paper is trying to address.

Generating engaging narratives from visual inputs

Surveying methodologies for automated story creation

Analyzing datasets and evaluation metrics limitations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generating narratives from visual data

Surveying methodologies principles strengths limitations

Analyzing datasets evaluation metrics limitations

🔎 Similar Papers

No similar papers found.