Embodied Natural Language Interaction (NLI): Speech Input Patterns in Immersive Analytics

📅 2025-10-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current immersive analytics systems lack systematic characterization of embodied speech cues—such as spatial deixis, action verbs, and body-pose–linked terms—and how their dynamic interplay with disembodied language affects natural language interaction (NLI). Method: We conducted a Wizard-of-Oz study (N=24), collecting and axial-coding 1,280 speech acts to identify recurrent patterns. Contribution/Results: We propose the first taxonomy of five embodied–disembodied speech input patterns. Introducing semantic entropy as a metric, we quantify input uncertainty and reveal users’ adaptive switching among patterns based on task phase and embodiment dependency. Embodied cues significantly reduce semantic ambiguity and enhance system robustness in intent recognition. Our work establishes actionable design principles and an evaluation framework for embodied voice interaction in immersive analytics.

Technology Category

Application Category

📝 Abstract
Embodiment shapes how users verbally express intent when interacting with data through speech interfaces in immersive analytics. Despite growing interest in Natural Language Interaction (NLI) for visual analytics in immersive environments, users' speech patterns and their use of embodiment cues in speech remain underexplored. Understanding their interplay is crucial to bridging the gap between users' intent and an immersive analytic system. To address this, we report the results from 15 participants in a user study conducted using the Wizard of Oz method. We performed axial coding on 1,280 speech acts derived from 734 utterances, examining how analysis tasks are carried out with embodiment and linguistic features. Next, we measured speech input uncertainty for each analysis task using the semantic entropy of utterances, estimating how uncertain users' speech inputs appear to an analytic system. Through these analyses, we identified five speech input patterns, showing that users dynamically blend embodied and non-embodied speech acts depending on data analysis tasks, phases, and embodiment reliance driven by the counts and types of embodiment cues in each utterance. We then examined how these patterns align with user reflections on factors that challenge speech interaction during the study. Finally, we propose design implications aligned with the five patterns.
Problem

Research questions and friction points this paper is trying to address.

Understanding speech patterns using embodiment cues in immersive analytics
Measuring speech input uncertainty through semantic entropy analysis
Identifying how users blend embodied and non-embodied speech acts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzed speech patterns using axial coding
Measured input uncertainty via semantic entropy
Identified five dynamic speech input patterns
🔎 Similar Papers
No similar papers found.
H
Hyemi Song
University of Maryland College Park
M
Matthew Johnson
University of Maryland College Park
K
Kirsten Whitley
Department of Defense
E
Eric Krokos
Department of Defense
Amitabh Varshney
Amitabh Varshney
Dean, College of Computer, Mathematical, and Natural Sciences,Professor of Computer Science
VisualizationComputer GraphicsBiological VisualizationVirtual Reality