Disability-First AI Dataset Annotation: Co-designing Stuttered Speech Annotation Guidelines with People Who Stutter

📅 2026-02-11

📈 Citations: 1

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This study addresses the prevalent issue in existing AI speech datasets where disfluent speech—such as that associated with stuttering—is often annotated by crowdworkers lacking lived or clinical experience, leading to inconsistent or distorted labels. To counter this, the authors engaged individuals who stutter and domain experts through semi-structured interviews and co-design workshops, integrating embodied disability experiences into the annotation process. They propose a participatory annotation paradigm centered on “disability-first” and “diversity-aware” principles. The resulting inclusive annotation framework challenges conventional static labeling systems and yields a detailed annotation guideline grounded in the lived realities of people who stutter. This approach not only markedly improves dataset quality but also offers a scalable, inclusive data practice for integrating disability perspectives throughout the AI development lifecycle.

Technology Category

Application Category

📝 Abstract

Despite efforts to increase the representation of disabled people in AI datasets, accessibility datasets are often annotated by crowdworkers without disability-specific expertise, leading to inconsistent or inaccurate labels. This paper examines these annotation challenges through a case study of annotating speech data from people who stutter (PWS). Given the variability of stuttering and differing views on how it manifests, annotating and transcribing stuttered speech remains difficult, even for trained professionals. Through interviews and co-design workshops with PWS and domain experts, we identify challenges in stuttered speech annotation and develop practices that integrate the lived experiences of PWS into the annotation process. Our findings highlight the value of embodied knowledge in improving dataset quality, while revealing tensions between the complexity of disability experiences and the rigidity of static labels. We conclude with implications for disability-first and multiplicity-aware approaches to data interpretation across the AI pipeline.

Problem

Research questions and friction points this paper is trying to address.

disability representation

stuttered speech annotation

AI dataset bias

accessibility datasets

annotation inconsistency

Innovation

Methods, ideas, or system contributions that make the work stand out.

disability-first

co-design

stuttered speech annotation