Generating Visual Stories with Grounded and Coreferent Characters

📅 2024-09-20

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

150K/year

🤖 AI Summary

Existing visual story generation methods neglect character modeling, leading to missing character references, ambiguous pronouns, and cross-frame inconsistency. To address this, we introduce the **character-centric visual story generation task**, aiming for visual localizability and textual coreference consistency of characters across multi-image sequences. We construct the first VIST extension dataset annotated with vision–language character coreference chains; design dual-dimensional evaluation metrics—character richness and coreference quality; and propose a multimodal generation framework integrating automated coreference annotation, character-aware fine-tuning, and joint visual grounding with textual coreference modeling. Experiments demonstrate that our approach significantly outperforms baselines and state-of-the-art models in character recurrence rate, visual grounding fidelity, and cross-sentence coreference consistency.

Technology Category

Application Category

📝 Abstract

Characters are important in narratives. They move the plot forward, create emotional connections, and embody the story's themes. Visual storytelling methods focus more on the plot and events relating to it, without building the narrative around specific characters. As a result, the generated stories feel generic, with character mentions being absent, vague, or incorrect. To mitigate these issues, we introduce the new task of character-centric story generation and present the first model capable of predicting visual stories with consistently grounded and coreferent character mentions. Our model is finetuned on a new dataset which we build on top of the widely used VIST benchmark. Specifically, we develop an automated pipeline to enrich VIST with visual and textual character coreference chains. We also propose new evaluation metrics to measure the richness of characters and coreference in stories. Experimental results show that our model generates stories with recurring characters which are consistent and coreferent to larger extent compared to baselines and state-of-the-art systems.

Problem

Research questions and friction points this paper is trying to address.

Generates visual stories with consistent character mentions.

Addresses generic storytelling by focusing on character-centric narratives.

Introduces new metrics to evaluate character richness and coreference.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Character-centric story generation model

Automated pipeline for coreference chains

New metrics for character richness evaluation

🔎 Similar Papers

ContextualStory: Consistent Visual Storytelling with Spatially-Enhanced and Storyline Context