Structured Graph Representations for Visual Narrative Reasoning: A Hierarchical Framework for Comics

📅 2025-04-14

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

career value

155K/year

🤖 AI Summary

This paper addresses the challenge of structured semantic understanding in visual narratives (e.g., comics). We propose a hierarchical multimodal knowledge graph framework that decomposes narratives into three granular levels: story arcs, event segments, and panels—unifying semantic and spatiotemporal modeling across levels. Our key innovation is a novel multi-granularity alignment mechanism enabling panel-level visual–textual coupling and cross-level symbolic reasoning. The framework constructs multimodal graphs, fuses them hierarchically, and adapts to a manually annotated Manga109 subset. Evaluated on four tasks—action retrieval, dialogue tracking, character mapping, and panel temporal reconstruction—it achieves high precision and recall. Experiments demonstrate significant advantages in interpretability, consistent multimodal representation, and cross-task generalization.

Technology Category

Application Category

📝 Abstract

This paper presents a hierarchical knowledge graph framework for the structured understanding of visual narratives, focusing on multimodal media such as comics. The proposed method decomposes narrative content into multiple levels, from macro-level story arcs to fine-grained event segments. It represents them through integrated knowledge graphs that capture semantic, spatial, and temporal relationships. At the panel level, we construct multimodal graphs that link visual elements such as characters, objects, and actions with corresponding textual components, including dialogue and captions. These graphs are integrated across narrative levels to support reasoning over story structure, character continuity, and event progression. We apply our approach to a manually annotated subset of the Manga109 dataset and demonstrate its ability to support symbolic reasoning across diverse narrative tasks, including action retrieval, dialogue tracing, character appearance mapping, and panel timeline reconstruction. Evaluation results show high precision and recall across tasks, validating the coherence and interpretability of the framework. This work contributes a scalable foundation for narrative-based content analysis, interactive storytelling, and multimodal reasoning in visual media.

Problem

Research questions and friction points this paper is trying to address.

Modeling hierarchical semantic structures in visual narratives

Integrating multimodal elements across panel, event, and macro-event levels

Enabling interpretable reasoning for narrative understanding tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical knowledge graphs organize narrative content

Integrates semantic spatial temporal symbolic graphs

Supports interpretable symbolic reasoning for narratives

🔎 Similar Papers

One missing piece in Vision and Language: A Survey on Comics Understanding