Sandpiper: Orchestrated AI-Annotation for Educational Discourse at Scale

πŸ“… 2026-03-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study addresses the limitations of traditional qualitative analysis when applied to large-scale educational dialogue data, including inefficiency, poor scalability, and challenges in AI integration such as privacy risks, hallucinations, and methodological rigor. To overcome these issues, the authors propose a hybrid active analysis system that integrates an interactive visual dashboard with a codebook-constrained orchestration mechanism for large language models (LLMs), effectively mitigating hallucinations. The system incorporates a context-aware automated de-identification pipeline and leverages institutional security infrastructure to ensure data privacy. By enabling continuous comparative evaluation between AI-generated and human annotations, the approach significantly enhances research efficiency, inter-rater reliability, and researcher trust in AI-assisted qualitative analysis.

Technology Category

Application Category

πŸ“ Abstract
Digital educational environments are expanding toward complex AI and human discourse, providing researchers with an abundance of data that offers deep insights into learning and instructional processes. However, traditional qualitative analysis remains a labor-intensive bottleneck, severely limiting the scale at which this research can be conducted. We present Sandpiper, a mixed-initiative system designed to serve as a bridge between high-volume conversational data and human qualitative expertise. By tightly coupling interactive researcher dashboards with agentic Large Language Model (LLM) engines, the platform enables scalable analysis without sacrificing methodological rigor. Sandpiper addresses critical barriers to AI adoption in education by implementing context-aware, automated de-identification workflows supported by secure, university-housed infrastructure to ensure data privacy. Furthermore, the system employs schema-constrained orchestration to eliminate LLM hallucinations and enforces strict adherence to qualitative codebooks. An integrated evaluations engine allows for the continuous benchmarking of AI performance against human labels, fostering an iterative approach to model refinement and validation. We propose a user study to evaluate the system's efficacy in improving research efficiency, inter-rater reliability, and researcher trust in AI-assisted qualitative workflows.
Problem

Research questions and friction points this paper is trying to address.

qualitative analysis
AI annotation
educational discourse
data privacy
LLM hallucination
Innovation

Methods, ideas, or system contributions that make the work stand out.

AI-assisted qualitative analysis
schema-constrained orchestration
context-aware de-identification
LLM hallucination mitigation
mixed-initiative system
D
Daryl Hedley
National Tutoring Observatory
D
Doug Pietrzak
National Tutoring Observatory
J
Jorge Dias
National Tutoring Observatory
I
Ian Burden
National Tutoring Observatory
B
Bakhtawar Ahtisham
National Tutoring Observatory
Z
Zhuqian Zhou
National Tutoring Observatory
K
Kirk Vanacore
National Tutoring Observatory
J
Josh Marland
National Tutoring Observatory
R
Rachel Slama
National Tutoring Observatory
Justin Reich
Justin Reich
Associate Professor, MIT Comparative Media Studies/Writing
technologyeducationMOOCsteacher education
K
Kenneth Koedinger
National Tutoring Observatory
R
RenΓ© Kizilcec
National Tutoring Observatory