SimCoachCorpus: A naturalistic dataset with language and trajectories for embodied teaching

📅 2025-09-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing embodied skill-teaching datasets suffer from a lack of fine-grained temporal alignment between natural language instructions and continuous physical actions. To address this, we introduce the first multimodal embodied teaching dataset captured in naturalistic driving scenarios: we collect driving trajectories, control inputs, vehicle states, and track information from 29 participants in a racing simulator, synchronized with real-time verbal coaching (15 under professional instruction) and end-of-lap feedback. Crucially, we achieve millisecond-level temporal alignment between authentic coaching utterances and prolonged embodied actions—a first in the field—and provide rich human annotations across instruction type, learner compliance, cognitive load, and emotional state. The dataset comprises over 40 hours of driving video, 20,000 real-time utterances, 400+ feedback segments, and structured labels. Experiments demonstrate its effectiveness for modeling instructional interactions, joint language-action learning, and training intelligent tutoring systems. The dataset will be publicly released.

Technology Category

Application Category

📝 Abstract
Curated datasets are essential for training and evaluating AI approaches, but are often lacking in domains where language and physical action are deeply intertwined. In particular, few datasets capture how people acquire embodied skills through verbal instruction over time. To address this gap, we introduce SimCoachCorpus: a unique dataset of race car simulator driving that allows for the investigation of rich interactive phenomena during guided and unguided motor skill acquisition. In this dataset, 29 humans were asked to drive in a simulator around a race track for approximately ninety minutes. Fifteen participants were given personalized one-on-one instruction from a professional performance driving coach, and 14 participants drove without coaching. ame includes embodied features such as vehicle state and inputs, map (track boundaries and raceline), and cone landmarks. These are synchronized with concurrent verbal coaching from a professional coach and additional feedback at the end of each lap. We further provide annotations of coaching categories for each concurrent feedback utterance, ratings on students' compliance with coaching advice, and self-reported cognitive load and emotional state of participants (gathered from surveys during the study). The dataset includes over 20,000 concurrent feedback utterances, over 400 terminal feedback utterances, and over 40 hours of vehicle driving data. Our naturalistic dataset can be used for investigating motor learning dynamics, exploring linguistic phenomena, and training computational models of teaching. We demonstrate applications of this dataset for in-context learning, imitation learning, and topic modeling. The dataset introduced in this work will be released publicly upon publication of the peer-reviewed version of this paper. Researchers interested in early access may register at https://tinyurl.com/SimCoachCorpusForm.
Problem

Research questions and friction points this paper is trying to address.

Lack of datasets combining language and physical action in skill acquisition
Investigating embodied teaching through verbal instruction over time
Capturing motor learning dynamics with synchronized coaching and driving data
Innovation

Methods, ideas, or system contributions that make the work stand out.

SimCoachCorpus dataset with embodied trajectories and language
Synchronized vehicle data and professional verbal coaching feedback
Annotations for motor learning dynamics and computational teaching models