A Multimodal Data Collection Framework for Dialogue-Driven Assistive Robotics to Clarify Ambiguities: A Wizard-of-Oz Pilot Study

📅 2026-01-23

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This study addresses the limitations of existing wheelchair assistance systems, which lack the ability to clarify ambiguous user intentions and are hindered by the absence of multimodal datasets supporting natural dialogue. To bridge this gap, the authors propose a dialogue-driven interaction framework that employs a dual-room Wizard-of-Oz paradigm to simulate robotic autonomy, thereby eliciting natural user behaviors. The system simultaneously captures five modalities: RGB-D video, speech, IMU signals, end-effector poses of the robotic arm, and full-body joint states. This framework is the first tailored to ambiguity clarification tasks in assistive robotics and effectively captures diverse types of conversational ambiguity. A pilot dataset comprising 53 trials from five participants demonstrates high data quality and validates the method’s efficacy, laying the groundwork for large-scale dataset collection and the development of ambiguity-aware control algorithms.

Technology Category

Application Category

📝 Abstract

Integrated control of wheelchairs and wheelchair-mounted robotic arms (WMRAs) has strong potential to increase independence for users with severe motor limitations, yet existing interfaces often lack the flexibility needed for intuitive assistive interaction. Although data-driven AI methods show promise, progress is limited by the lack of multimodal datasets that capture natural Human-Robot Interaction (HRI), particularly conversational ambiguity in dialogue-driven control. To address this gap, we propose a multimodal data collection framework that employs a dialogue-based interaction protocol and a two-room Wizard-of-Oz (WoZ) setup to simulate robot autonomy while eliciting natural user behavior. The framework records five synchronized modalities: RGB-D video, conversational audio, inertial measurement unit (IMU) signals, end-effector Cartesian pose, and whole-body joint states across five assistive tasks. Using this framework, we collected a pilot dataset of 53 trials from five participants and validated its quality through motion smoothness analysis and user feedback. The results show that the framework effectively captures diverse ambiguity types and supports natural dialogue-driven interaction, demonstrating its suitability for scaling to a larger dataset for learning, benchmarking, and evaluation of ambiguity-aware assistive control.

Problem

Research questions and friction points this paper is trying to address.

assistive robotics

multimodal dataset

dialogue-driven control

conversational ambiguity

Human-Robot Interaction

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal data collection

dialogue-driven assistive robotics

Wizard-of-Oz