DeckFlow: Iterative Specification on a Multimodal Generative Canvas

📅 2025-06-18

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

Current generative AI tools lack task decomposition capabilities, systematic iterative refinement mechanisms, and support for exploratory generation-space navigation—limiting high-quality, personalized multimodal media creation. To address this, we propose DeckFlow, a specification-driven generative AI tool designed for multimodal (text/image/audio) content synthesis. Methodologically, DeckFlow integrates: (1) an infinite visual dataflow canvas enabling interconnected subtask management; (2) annotation clustering to support hierarchical goal decomposition and progressive specification refinement; and (3) grid-based multi-variant generation coupled with recursive feedback loops for systematic exploration of the generative space. Technically, it unifies multimodal foundation models, a visual dataflow interface, clustering-guided specification annotation, prompt variant sampling, and iterative scaffolding design. Empirical evaluation demonstrates that DeckFlow significantly outperforms conversational-AI baselines on text-to-image generation and effectively supports cross-modal creative workflows and structured user participation.

Technology Category

Application Category

📝 Abstract

Generative AI promises to allow people to create high-quality personalized media. Although powerful, we identify three fundamental design problems with existing tooling through a literature review. We introduce a multimodal generative AI tool, DeckFlow, to address these problems. First, DeckFlow supports task decomposition by allowing users to maintain multiple interconnected subtasks on an infinite canvas populated by cards connected through visual dataflow affordances. Second, DeckFlow supports a specification decomposition workflow where an initial goal is iteratively decomposed into smaller parts and combined using feature labels and clusters. Finally, DeckFlow supports generative space exploration by generating multiple prompt and output variations, presented in a grid, that can feed back recursively into the next design iteration. We evaluate DeckFlow for text-to-image generation against a state-of-practice conversational AI baseline for image generation tasks. We then add audio generation and investigate user behaviors in a more open-ended creative setting with text, image, and audio outputs.

Problem

Research questions and friction points this paper is trying to address.

Addresses task decomposition in generative AI workflows

Enables iterative specification via multimodal design tools

Supports generative space exploration for creative outputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal AI tool with infinite canvas

Iterative task and specification decomposition

Grid-based generative space exploration

🔎 Similar Papers

Exploring the Potentials and Challenges of Deep Generative Models in Product Design Conception