I Prompt, it Generates, we Negotiate. Exploring Text-Image Intertextuality in Human-AI Co-Creation of Visual Narratives with VLMs

📅 2025-11-05

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This study investigates the intertextuality mechanisms between human-authored textual intent and AI-generated images in collaborative visual storytelling involving novice users and vision-language models (VLMs). Using GPT-4o’s image generation capability, we conducted a three-phase qualitative study integrated with fuzzy-set qualitative comparative analysis (fsQCA) to identify three core collaborative strategies: prompt iteration, semantic expansion, and multimodal complementarity. We propose a theoretical framework of “text–image intertextuality,” characterizing four collaborative patterns and three empirically derived pathways to successful collaboration—namely, the Educational Collaborator, Technical Expert, and Visual Thinker. Findings demonstrate that AI-induced semantic overflow positively enhances creative ideation, while revealing critical challenges: insufficient cultural representation, weak visual consistency, and difficulties in narrative translation. The work provides empirical grounding and interface-design implications for developing human-centered, role-adaptive AI assistants in creative authoring contexts.

Technology Category

Application Category

📝 Abstract

Creating meaningful visual narratives through human-AI collaboration requires understanding how text-image intertextuality emerges when textual intentions meet AI-generated visuals. We conducted a three-phase qualitative study with 15 participants using GPT-4o to investigate how novices navigate sequential visual narratives. Our findings show that users develop strategies to harness AI's semantic surplus by recognizing meaningful visual content beyond literal descriptions, iteratively refining prompts, and constructing narrative significance through complementary text-image relationships. We identified four distinct collaboration patterns and, through fsQCA's analysis, discovered three pathways to successful intertextual collaboration: Educational Collaborator, Technical Expert, and Visual Thinker. However, participants faced challenges, including cultural representation gaps, visual consistency issues, and difficulties translating narrative concepts into visual prompts. These findings contribute to HCI research by providing an empirical account of extit{text-image intertextuality} in human-AI co-creation and proposing design implications for role-based AI assistants that better support iterative, human-led creative processes in visual storytelling.

Problem

Research questions and friction points this paper is trying to address.

Exploring text-image intertextuality in human-AI visual narrative co-creation

Investigating how novices navigate sequential visual storytelling with VLMs

Addressing cultural gaps and visual consistency in AI-generated narratives

Innovation

Methods, ideas, or system contributions that make the work stand out.

Iteratively refining prompts for visual narratives

Analyzing collaboration patterns with fsQCA methodology

Designing role-based AI assistants for storytelling

🔎 Similar Papers

M$^{2}$Chat: Empowering VLM for Multimodal LLM Interleaved Text-Image Generation