🤖 AI Summary
Current computing education largely overlooks security risks in LLM-assisted programming, leaving students ill-equipped to identify or mitigate unsafe AI-generated code.
Method: We propose Bifröst, an educational framework that uniquely integrates three pedagogical components: (1) adversarial vulnerability code generation—prompting LLMs to produce canonical insecure code (e.g., SQL injection, XSS); (2) a VS Code immersive plugin delivering real-time, context-aware static analysis feedback; and (3) guided reflective prompts triggered within authentic development workflows.
Contribution/Results: A classroom study (n=61) revealed students’ initial susceptibility to insecure LLM outputs. Post-intervention surveys (n=21) demonstrated a statistically significant increase in security vigilance toward LLM-generated code (p<0.01). Bifröst establishes a scalable, empirically validated paradigm for cultivating security-critical thinking in AI-augmented programming education.
📝 Abstract
The advent of Artificial Intelligence (AI), particularly large language models (LLMs), has revolutionized software development by enabling developers to specify tasks in natural language and receive corresponding code, boosting productivity. However, this shift also introduces security risks, as LLMs may generate insecure code that can be exploited by adversaries. Current educational approaches emphasize efficiency while overlooking these risks, leaving students underprepared to identify and mitigate security issues in AI-assisted workflows.
To address this gap, we present Bifröst, an educational framework that cultivates security awareness in AI-augmented development. Bifröst integrates (1) a Visual Studio Code extension simulating realistic environments, (2) adversarially configured LLMs that generate insecure code, and (3) a feedback system highlighting vulnerabilities. By immersing students in tasks with compromised LLMs and providing targeted security analysis, Bifröst cultivates critical evaluation skills; classroom deployments (n=61) show vulnerability to insecure code, while a post-intervention survey (n=21) indicates increased skepticism toward LLM outputs.