Towards Responsible Development of Generative AI for Education: An Evaluation-Driven Approach

📅 2024-05-21
🏛️ arXiv.org
📈 Citations: 60
Influential: 8
📄 PDF
🤖 AI Summary
This study addresses two key challenges in deploying generative AI for education: the difficulty of translating pedagogical intuition into effective prompts, and the lack of consensus on defining evidence-based teaching practices. Methodologically, we propose an assessment-driven, responsible AI development paradigm: (1) we construct seven educationally grounded benchmarks rooted in learning science; (2) we design a fine-grained instruction-tuning dataset and train LearnLM-Tutor, a domain-specialized large language model; and (3) we introduce a novel multi-dimensional evaluation framework integrating quantitative/qualitative and automated/human assessments, coupled with an “evaluation–feedback–fine-tuning” closed-loop mechanism. Our contribution is the first systematic integration of educational benchmarks with foundation models—specifically enabling pedagogically informed adaptation of Gemini. Experiments demonstrate that LearnLM-Tutor significantly outperforms prompt-engineered baselines in instructional supportiveness, explanatory clarity, and cognitive alignment, earning consistent preference from both teachers and students—establishing a scalable methodological foundation for educational AI evaluation.

Technology Category

Application Category

📝 Abstract
A major challenge facing the world is the provision of equitable and universal access to quality education. Recent advances in generative AI (gen AI) have created excitement about the potential of new technologies to offer a personal tutor for every learner and a teaching assistant for every teacher. The full extent of this dream, however, has not yet materialised. We argue that this is primarily due to the difficulties with verbalising pedagogical intuitions into gen AI prompts and the lack of good evaluation practices, reinforced by the challenges in defining excellent pedagogy. Here we present our work collaborating with learners and educators to translate high level principles from learning science into a pragmatic set of seven diverse educational benchmarks, spanning quantitative, qualitative, automatic and human evaluations; and to develop a new set of fine-tuning datasets to improve the pedagogical capabilities of Gemini, introducing LearnLM-Tutor. Our evaluations show that LearnLM-Tutor is consistently preferred over a prompt tuned Gemini by educators and learners on a number of pedagogical dimensions. We hope that this work can serve as a first step towards developing a comprehensive educational evaluation framework, and that this can enable rapid progress within the AI and EdTech communities towards maximising the positive impact of gen AI in education.
Problem

Research questions and friction points this paper is trying to address.

Developing educational benchmarks for generative AI evaluation
Improving pedagogical capabilities of AI through fine-tuning datasets
Creating a comprehensive evaluation framework for AI in education
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed seven diverse educational benchmarks
Created fine-tuning datasets for Gemini
Introduced LearnLM-Tutor for improved pedagogy
🔎 Similar Papers
No similar papers found.
Irina Jurenka
Irina Jurenka
DeepMind
Artificial IntelligenceNeuroscienceUnsupervised LearningGenerative ModelsRepresentation
M
Markus Kunesch
Google DeepMind
Kevin R. McKee
Kevin R. McKee
Staff Research Scientist, Google DeepMind
Cooperative AIHuman dataSocial cognitionParticipatory AIReinforcement learning
Daniel Gillick
Daniel Gillick
Research Scientist, Google
Natural Language Processing
S
Shaojian Zhu
Google DeepMind
S
Sara Wiltberger
Google DeepMind
S
Shubham Milind Phal
Google DeepMind
K
Katherine Hermann
Google DeepMind
Daniel Kasenberg
Daniel Kasenberg
Research Scientist, Google DeepMind
Artificial Intelligence
A
Avishkar Bhoopchand
Google DeepMind
Ankit Anand
Ankit Anand
Research Scientist, Google DeepMind
Artificial IntelligenceMachine LearningAlgorithms
M
Miruna Pislar
Google DeepMind
S
Stephanie Chan
Google DeepMind
Lisa Wang
Lisa Wang
DeepMind
EducationMachine LearningIntelligent Tutoring Systems
J
Jennifer She
Google DeepMind
P
Parsa Mahmoudieh
Google DeepMind
A
Aliya Rysbek
Google DeepMind
W
Wei-Jen Ko
Google
A
Andrea Huber
Google DeepMind
B
Brett Wiltshire
Google DeepMind
G
G. Elidan
Google Research
R
Roni Rabin
Google Research
J
Jasmin Rubinovitz
Google
A
Amit Pitaru
Google Creative Lab
M
Mac McAllister
Google
J
Julia Wilkowski
Google
David Choi
David Choi
Anthropic, work carried out while employed at Google DeepMind
Roee Engelberg
Roee Engelberg
Google Research
L
Lidan Hackmon
Google Research
A
Adva Levin
Google Research
R
Rachel Griffin
Arizona State University
Michael Sears
Michael Sears
Arizona State University
F
Filip Bar
Lund University
M
Mia Mesar
Google
M
Mana Jabbour
Google
Arslan Chaudhry
Arslan Chaudhry
DeepMind
Machine LearningArtificial Intelligence
J
James Cohan
Google
S
Sridhar Thiagarajan
Google DeepMind
Nir Levine
Nir Levine
Research Engineer at DeepMind
Machine LearningReinforcement LearningArtificial Intelligence
B
Ben Brown
Google DeepMind
D
Dilan Gorur
Google DeepMind
S
Svetlana Grant
Google DeepMind
R
Rachel Hashimoshoni
Google
Laura Weidinger
Laura Weidinger
Staff Research Scientist at DeepMind
J
Jieru Hu
Google DeepMind
D
Dawn Chen
Google
K
Kuba Dolecki
Google
C
Canfer Akbulut
Google DeepMind
M
Maxwell L. Bileschi
Google DeepMind
L
Laura Culp
Google DeepMind
W
Wen-Xin Dong
Google
Nahema Marchal
Nahema Marchal
Senior Research Scientist, DeepMind
Computational Social ScienceDigital PoliticsArtificial IntelligenceDemocracyPolarization
K
Kelsi Van Deman
Google Creative Lab
H
Hema Bajaj Misra
Google
M
Michael Duah
Arizona State University
M
Moran Ambar
Google Research
A
Avi Caciularu
Google Research
S
Sandra Lefdal
Google DeepMind
Christopher Summerfield
Christopher Summerfield
University of Oxford
Cognitive ScienceNeuroscience
J
James An
Google DeepMind
P
P. Kamienny
Google DeepMind
A
Abhinit Mohdi
Google
T
Theofilos Strinopoulous
Google
A
Annie Hale
Arizona State University
W
Wayne Anderson
Arizona State University
L
Luis C. Cobo
Google DeepMind
N
Niv Efron
Google Research
M
Muktha Ananda
Google
Shakir Mohamed
Shakir Mohamed
Research Director, Google DeepMind
Machine LearningBayesian StatisticsDeep LearningSociotechnical AIArtificial Intelligence
M
Maureen Heymans
Google
Z
Z. Ghahramani
Google DeepMind
Yossi Matias
Yossi Matias
Google
B
Ben Gomes
Google
L
Lila Ibrahim
Google DeepMind