🤖 AI Summary
This study addresses the governance dilemma posed by generative artificial intelligence (GenAI) in STEM assessment: outright prohibition is impractical, yet unrestricted use compromises validity. Grounded in the Evidence-Centered Design (ECD) framework, the work reconceptualizes GenAI as an endogenous assessment variable and proposes three governance strategies—Restrict, Scaffold, and Require—whose application is determined by the target construct. Implemented through two task types in a university physics course, the approach integrates process-oriented outputs such as prompts, critiques, and revisions, along with expert-driven scoring rubrics, to operationalize disciplinary AI interaction competence within an observable and assessable structure. The study demonstrates the reliability of this measurement and offers a principled governance pathway for STEM assessment in the AI era.
📝 Abstract
Generative Artificial Intelligence (GenAI) presents a governance challenge for STEM assessment. Unrestricted GenAI access enables task outsourcing that undermines the validity of traditional assessments; blanket prohibitions are difficult to enforce, may push use underground, and do little to prepare students for workplaces where GenAI-supported workflows are increasingly common. This paper addresses this dilemma by proposing a framework grounded in Evidence-Centered Design (ECD) that treats GenAI as a design variable within the assessment argument rather than an external threat to it. The framework analyzes how GenAI reshapes the student model, evidence model, and task model, and uses this analysis to articulate three principled governance stances. Restrict is warranted when GenAI would contaminate the inferential link between student work products and targeted unaided proficiency. Scaffold is warranted when bounded GenAI support can support peripheral demands without revealing the target construct, preserving inferential interpretability. Require is warranted when the target construct is disciplinary AI interaction competency and tasks can be designed to elicit process artifacts, including prompts, critiques, and revisions, that make student reasoning observable, scorable, and distinguishable from AI-generated output. This framework specifies when to restrict, scaffold, or require GenAI use in STEM assessment. We present two task designs deployed in an introductory physics course and demonstrate that disciplinary AI interaction competencies are observable in student response artifacts and can be scored using defensible rubrics grounded in student data and expert knowledge. By situating GenAI governance within validity arguments, the framework offers actionable guidance for preserving learning integrity while supporting authentic preparation for AI-enabled professional environments.