Vistoria: A Multimodal System to Support Fictional Story Writing through Instrumental Text-Image Co-Editing

📅 2025-09-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing creative writing tools predominantly operate within the textual modality, limiting support for fiction ideation grounded in visual cognition—such as memory, dreams, or metaphor. Method: We propose a text-image co-editing framework that treats images and text as equally fundamental narrative materials. Drawing on instrumental interaction theory and structural mapping theory, we design multimodal operations—including lasso selection, collage-based recombination, filter-based modulation, and perspective transformation—to enable cross-modal narrative exploration. A Wizard-of-Oz co-design process and controlled experiments informed the development of a dynamically evolving collaborative authoring system. Contribution/Results: Evaluation shows significant improvements in idea divergence, receptivity to serendipitous inspiration, and traceability of narrative evolution—alongside enhanced authorial agency and perceived control. These findings empirically validate the efficacy and innovative value of multimodal narrative modeling for fictional creation.

Technology Category

Application Category

📝 Abstract
Humans think visually-we remember in images, dream in pictures, and use visual metaphors to communicate. Yet, most creative writing tools remain text-centric, limiting how authors plan and translate ideas. We present Vistoria, a system for synchronized text-image co-editing in fictional story writing that treats visuals and text as coequal narrative materials. A formative Wizard-of-Oz co-design study with 10 story writers revealed how sketches, images, and annotations serve as essential instruments for ideation and organization. Drawing on theories of Instrumental Interaction and Structural Mapping, Vistoria introduces multimodal operations-lasso, collage, filters, and perspective shifts that enable seamless narrative exploration across modalities. A controlled study with 12 participants shows that co-editing enhances expressiveness, immersion, and collaboration, enabling writers to explore divergent directions, embrace serendipitous randomness, and trace evolving storylines. While multimodality increased cognitive demand, participants reported stronger senses of authorship and agency. These findings demonstrate how multimodal co-editing expands creative potential by balancing abstraction and concreteness in narrative development.
Problem

Research questions and friction points this paper is trying to address.

Supporting fictional story writing with synchronized text-image co-editing
Addressing limitations of text-centric creative writing tools
Enhancing narrative exploration through multimodal instrumental operations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Synchronized text-image co-editing system
Multimodal operations for narrative exploration
Instrumental Interaction enabling cross-modal editing
🔎 Similar Papers
No similar papers found.
Kexue Fu
Kexue Fu
City University of Hong Kong
HCIStorytellingCreativityCognitionHuman-AI collaboration
Jingfei Huang
Jingfei Huang
Harvard University
HCIGenerative AISpatial Perception
Long Ling
Long Ling
Tongji University.
Human AI InteractionHCIDigital Fabrication
S
Sumin Hong
Computer Science and Engineering, University of Notre Dame, United States
Yihang Zuo
Yihang Zuo
PhD student @ ASU
EDAPIMAI accelerator
R
Ray LC
City University of Hong Kong, China
T
Toby Jia-jun Li
Department of Computer Science and Engineering, University of Notre Dame, United States