SHAPE: Unifying Safety, Helpfulness and Pedagogy for Educational LLMs

๐Ÿ“… 2026-04-23
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

203K/year
๐Ÿค– AI Summary
This work addresses โ€œpedagogical jailbreakโ€ attacks in educational settings, where students bypass guided instruction by using adversarial prompts to directly extract answers from large language models (LLMs). The study formally defines this problem for the first time and introduces SHAPE, a benchmark comprising 9,087 student query pairs. It proposes a unified optimization framework grounded in knowledge mastery graphs, which employs graph-augmented reasoning to infer prerequisite knowledge and identify gaps in understanding, coupled with an explicit gating mechanism that dynamically routes between instructional guidance and direct problem-solving. Evaluated across multiple LLMs, the approach significantly enhances resistance to jailbreak attacks while achieving near-optimal helpfulness, thereby jointly optimizing safety, utility, and pedagogical effectiveness.

Technology Category

Application Category

๐Ÿ“ Abstract
Large Language Models (LLMs) have been widely explored in educational scenarios. We identify a critical vulnerability in current educational LLMs, pedagogical jailbreaks, where students use answer-inducing prompts to elicit solutions rather than scaffolded instructions. To enable systematic study, we unify and formalize safe, helpful, and pedagogical behaviors with a knowledge-mastery graph and introduce SHAPE, a benchmark of 9,087 student-question pairs for evaluating tutoring behavior under adversarial pressure. We propose a graph-augmented tutoring pipeline that infers prerequisite concepts from queries, identifies mastery gaps, and routes generation between instructing and problem-solving via explicit gating. Experiments across multiple LLMs show that our method yields significantly improved safety under two pedagogical jailbreak settings, while maintaining near-ceiling helpfulness under the same evaluation protocol. Our code and data are available at https://github.com/MAPS-research/SHaPE
Problem

Research questions and friction points this paper is trying to address.

pedagogical jailbreaks
educational LLMs
answer-inducing prompts
scaffolded instructions
tutoring behavior
Innovation

Methods, ideas, or system contributions that make the work stand out.

pedagogical jailbreaks
knowledge-mastery graph
graph-augmented tutoring
educational LLMs
SHAPE benchmark
๐Ÿ”Ž Similar Papers
No similar papers found.