A Multimodal Framework for Understanding Collaborative Design Processes

📅 2025-08-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of integrating and analyzing heterogeneous multimodal data—such as video, audio, handwritten notes, and eye-tracking streams—in collaborative design, which often renders collaboration mechanisms and decision-making processes opaque. We propose a modular, extensible multimodal analysis framework that integrates AI-driven artifact auto-extraction, multi-stream temporal alignment graphs, thematic card summarization, and drill-down interactive analysis, implemented in an interactive visual system named reCAPit. Our key contribution lies in enabling semantic-level cross-modal fusion and transparent, interpretable explanations, thereby significantly enhancing process traceability and comprehensibility. Evaluation across six interdisciplinary workshops—including urban planning and ensemble music rehearsal—demonstrates that the framework effectively uncovers collaborative dynamics, supports decision provenance, and improves communication efficiency between researchers and practitioners.

Technology Category

Application Category

📝 Abstract
An essential task in analyzing collaborative design processes, such as those that are part of workshops in design studies, is identifying design outcomes and understanding how the collaboration between participants formed the results and led to decision-making. However, findings are typically restricted to a consolidated textual form based on notes from interviews or observations. A challenge arises from integrating different sources of observations, leading to large amounts and heterogeneity of collected data. To address this challenge we propose a practical, modular, and adaptable framework of workshop setup, multimodal data acquisition, AI-based artifact extraction, and visual analysis. Our interactive visual analysis system, reCAPit, allows the flexible combination of different modalities, including video, audio, notes, or gaze, to analyze and communicate important workshop findings. A multimodal streamgraph displays activity and attention in the working area, temporally aligned topic cards summarize participants' discussions, and drill-down techniques allow inspecting raw data of included sources. As part of our research, we conducted six workshops across different themes ranging from social science research on urban planning to a design study on band-practice visualization. The latter two are examined in detail and described as case studies. Further, we present considerations for planning workshops and challenges that we derive from our own experience and the interviews we conducted with workshop experts. Our research extends existing methodology of collaborative design workshops by promoting data-rich acquisition of multimodal observations, combined AI-based extraction and interactive visual analysis, and transparent dissemination of results.
Problem

Research questions and friction points this paper is trying to address.

Analyzing collaborative design processes with multimodal data integration
Addressing data heterogeneity in workshop observations and outcomes
Enhancing workshop findings through AI extraction and visual analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modular framework for multimodal data acquisition
AI-based artifact extraction for analysis
Interactive visual system for flexible modality combination
🔎 Similar Papers
No similar papers found.
M
Maurice Koch
Visualization Research Center (VISUS), University of Stuttgart
N
Nelusa Pathmanathan
Visualization Research Center (VISUS), University of Stuttgart
Daniel Weiskopf
Daniel Weiskopf
Professor of Computer Science, University of Stuttgart
VisualizationVisual AnalyticsComputer GraphicsEye Tracking
Kuno Kurzhals
Kuno Kurzhals
University of Stuttgart