🤖 AI Summary
Prior AI-in-education research has predominantly focused on unimodal AI tools, neglecting how novice programmers select and deploy multimodal generative AI—capable of processing text, audio, images, and screen sharing—in learning contexts.
Method: We conducted think-aloud experiments (N=16) and semi-structured interviews, complemented by participatory observation and qualitative thematic analysis, to examine students’ interaction patterns with commercial multimodal AI platforms.
Contribution/Results: This study is the first to empirically identify three key determinants of modality preference—task complexity, expressive efficiency, and technical familiarity—and to characterize distinct usage scenarios. It challenges the unimodal paradigm in AI-enhanced programming education and provides empirical grounding for designing modality-adaptive, personalized AI-augmented pedagogical systems. Findings inform new design principles for intelligent tutoring systems that dynamically align AI modalities with learners’ cognitive needs and contextual constraints.
📝 Abstract
The broad adoption of Generative AI (GenAI) is impacting Computer Science education, and recent studies found its benefits and potential concerns when students use it for programming learning. However, most existing explorations focus on GenAI tools that primarily support text-to-text interaction. With recent developments, GenAI applications have begun supporting multiple modes of communication, known as multimodality. In this work, we explored how undergraduate programming novices choose and work with multimodal GenAI tools, and their criteria for choices. We selected a commercially available multimodal GenAI platform for interaction, as it supports multiple input and output modalities, including text, audio, image upload, and real-time screen-sharing. Through 16 think-aloud sessions that combined participant observation with follow-up semi-structured interviews, we investigated student modality choices for GenAI tools when completing programming problems and the underlying criteria for modality selections. With multimodal communication emerging as the future of AI in education, this work aims to spark continued exploration on understanding student interaction with multimodal GenAI in the context of CS education.