Toward a Unified Framework for Collaborative Design of Human-AI Interaction

📅 2026-05-01

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

Existing multimodal human-AI interaction systems often treat alignment, explainability, and user agency in isolation, leading to poor user understanding of AI intent and diminished trust and sense of control. This work proposes a unified collaborative framework that, for the first time, co-designs multimodal alignment, real-time multimodal explainable feedback (encompassing visual, textual, and spoken modalities), and user intervention mechanisms within a continuous interaction paradigm. By establishing a closed-loop architecture for multimodal intent recognition and responsive feedback, the framework significantly enhances users’ comprehension of system behavior, perceived control, and overall transparency. Empirical validation in two high-stakes, time-sensitive scenarios—collaborative design and warehouse robotics—demonstrates the efficacy of this approach in fostering effective and trustworthy human-AI collaboration.

📝 Abstract

Human computer interaction is shifting from screen-based systems to multimodal interfaces where artificial intelligence powered systems increasingly interpret user intent through speech, gesture, and gaze. Yet users rarely understand how these interpretations are made, compromising trust and control. Existing approaches treat multimodal alignment, explainability, and human agency as separate concerns, leaving critical gaps in transparency and user oversight. We propose a Human Artificial Intelligence collaboration framework integrating these three principles as interdependent design requirements: 1) multimodal alignment for accurate intent interpretation, 2) interaction centric explainability delivering real time visual, textual, and audio feedback, and 3) agency preserving mechanisms enabling users to accept, reject, or modify artificial intelligence suggestions at any time. We presented the framework through two scenarios, collaborative design and extended reality warehouse robot collaboration, chosen to span differences in time pressure and error reversibility, with the latter situated in a domain where misinterpretation carries documented safety consequences. This approach reframes collaboration as a continuous interaction property, benefiting designers, researchers, and end users by ensuring that as artificial intelligence systems grow more proactive, user understanding and control remain first class design properties.

Problem

Research questions and friction points this paper is trying to address.

multimodal alignment

explainability

human agency

human-AI collaboration

transparency

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal alignment

interaction-centric explainability

human agency