Gesturing Toward Abstraction: Multimodal Convention Formation in Collaborative Physical Tasks

📅 2026-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates how humans, through repeated collaboration, progressively develop efficient shared procedural abstractions using multimodal communication—such as speech and gesture—to achieve common goals. By integrating online language experiments with an augmented reality (AR) laboratory task involving collaborative physical tower construction, the work extends probabilistic convention formation models to embodied multimodal collaboration for the first time. Leveraging AR to disentangle speech and gesture signals, the research combines behavioral experiments with probabilistic modeling to analyze cross-modal interaction data. The findings reveal a dynamic shift in modality preference across collaboration stages and demonstrate that participants employ multimodal redundancy to highlight critical changes, substantially enhancing both efficiency and accuracy in joint tasks. These insights provide a theoretical foundation for developing embodied agents capable of multimodal abstraction.

Technology Category

Application Category

📝 Abstract
A quintessential feature of human intelligence is the ability to create ad hoc conventions over time to achieve shared goals efficiently. We investigate how communication strategies evolve through repeated collaboration as people coordinate on shared procedural abstractions. To this end, we conducted an online unimodal study (n = 98) using natural language to probe abstraction hierarchies. In a follow-up lab study (n = 40), we examined how multimodal communication (speech and gestures) changed during physical collaboration. Pairs used augmented reality to isolate their partner's hand and voice; one participant viewed a 3D virtual tower and sent instructions to the other, who built the physical tower. Participants became faster and more accurate by establishing linguistic and gestural abstractions and using cross-modal redundancy to emphasize key changes from previous interactions. Based on these findings, we extend probabilistic models of convention formation to multimodal settings, capturing shifts in modality preferences. Our findings and model provide building blocks for designing convention-aware intelligent agents situated in the physical world.
Problem

Research questions and friction points this paper is trying to address.

multimodal communication
convention formation
collaborative tasks
procedural abstraction
gesture
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal communication
convention formation
procedural abstraction
cross-modal redundancy
probabilistic modeling
🔎 Similar Papers
No similar papers found.