A Multimodal Data Collection Framework for Dialogue-Driven Assistive Robotics to Clarify Ambiguities: A Wizard-of-Oz Pilot Study

πŸ“… 2026-01-23
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the limitations of existing wheelchair assistance systems, which lack the ability to clarify ambiguous user intentions and are hindered by the absence of multimodal datasets supporting natural dialogue. To bridge this gap, the authors propose a dialogue-driven interaction framework that employs a dual-room Wizard-of-Oz paradigm to simulate robotic autonomy, thereby eliciting natural user behaviors. The system simultaneously captures five modalities: RGB-D video, speech, IMU signals, end-effector poses of the robotic arm, and full-body joint states. This framework is the first tailored to ambiguity clarification tasks in assistive robotics and effectively captures diverse types of conversational ambiguity. A pilot dataset comprising 53 trials from five participants demonstrates high data quality and validates the method’s efficacy, laying the groundwork for large-scale dataset collection and the development of ambiguity-aware control algorithms.

Technology Category

Application Category

πŸ“ Abstract
Integrated control of wheelchairs and wheelchair-mounted robotic arms (WMRAs) has strong potential to increase independence for users with severe motor limitations, yet existing interfaces often lack the flexibility needed for intuitive assistive interaction. Although data-driven AI methods show promise, progress is limited by the lack of multimodal datasets that capture natural Human-Robot Interaction (HRI), particularly conversational ambiguity in dialogue-driven control. To address this gap, we propose a multimodal data collection framework that employs a dialogue-based interaction protocol and a two-room Wizard-of-Oz (WoZ) setup to simulate robot autonomy while eliciting natural user behavior. The framework records five synchronized modalities: RGB-D video, conversational audio, inertial measurement unit (IMU) signals, end-effector Cartesian pose, and whole-body joint states across five assistive tasks. Using this framework, we collected a pilot dataset of 53 trials from five participants and validated its quality through motion smoothness analysis and user feedback. The results show that the framework effectively captures diverse ambiguity types and supports natural dialogue-driven interaction, demonstrating its suitability for scaling to a larger dataset for learning, benchmarking, and evaluation of ambiguity-aware assistive control.
Problem

Research questions and friction points this paper is trying to address.

assistive robotics
multimodal dataset
dialogue-driven control
conversational ambiguity
Human-Robot Interaction
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal data collection
dialogue-driven assistive robotics
Wizard-of-Oz
conversational ambiguity
human-robot interaction
πŸ”Ž Similar Papers
No similar papers found.
G
Guangping Liu
Aerospace and Mechanical Engineering Department, Saint Louis University, St. Louis, MO, 63103, United States
N
Nicholas Hawkins
Aerospace and Mechanical Engineering Department, Saint Louis University, St. Louis, MO, 63103, United States
B
Billy Madden
Aerospace and Mechanical Engineering Department, Saint Louis University, St. Louis, MO, 63103, United States
T
Tipu Sultan
Aerospace and Mechanical Engineering Department, Saint Louis University, St. Louis, MO, 63103, United States
Flavio Esposito
Flavio Esposito
Associate Professor, Saint Louis University, USA
Computer NetworkingDistributed SystemsNetwork VirtualizationNetwork Management
M
Madi Babaiasl
Aerospace and Mechanical Engineering Department, Saint Louis University, St. Louis, MO, 63103, United States