When AI Takes the Wheel: Security Analysis of Framework-Constrained Program Generation

📅 2025-10-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work reveals systematic security vulnerabilities in large language models (LLMs) when generating framework-constrained programs—exemplified by Chrome extensions. To rigorously assess their security capabilities, we introduce ChromeSecBench, the first LLM-oriented security evaluation benchmark for Chrome extensions. It comprises 140 real-world vulnerability-based prompts, used to elicit full extension implementations from nine state-of-the-art LLMs. Experimental results show that 18%–50% of generated extensions contain exploitable security flaws; vulnerability rates reach 83% in authentication and 78% in cookie management—frequently leading to sensitive browser data leakage. Counterintuitively, advanced reasoning models exhibit higher vulnerability rates. This study provides the first quantitative characterization of LLMs’ security deficiencies in framework-constrained programming and establishes a novel, security-centric paradigm for code-generation evaluation, accompanied by an open-source benchmark.

Technology Category

Application Category

📝 Abstract
In recent years, the AI wave has grown rapidly in software development. Even novice developers can now design and generate complex framework-constrained software systems based on their high-level requirements with the help of Large Language Models (LLMs). However, when LLMs gradually "take the wheel" of software development, developers may only check whether the program works. They often miss security problems hidden in how the generated programs are implemented. In this work, we investigate the security properties of framework-constrained programs generated by state-of-the-art LLMs. We focus specifically on Chrome extensions due to their complex security model involving multiple privilege boundaries and isolated components. To achieve this, we built ChromeSecBench, a dataset with 140 prompts based on known vulnerable extensions. We used these prompts to instruct nine state-of-the-art LLMs to generate complete Chrome extensions, and then analyzed them for vulnerabilities across three dimensions: scenario types, model differences, and vulnerability categories. Our results show that LLMs produced vulnerable programs at alarmingly high rates (18%-50%), particularly in Authentication & Identity and Cookie Management scenarios (up to 83% and 78% respectively). Most vulnerabilities exposed sensitive browser data like cookies, history, or bookmarks to untrusted code. Interestingly, we found that advanced reasoning models performed worse, generating more vulnerabilities than simpler models. These findings highlight a critical gap between LLMs' coding skills and their ability to write secure framework-constrained programs.
Problem

Research questions and friction points this paper is trying to address.

Analyzing security vulnerabilities in LLM-generated framework-constrained programs
Investigating high vulnerability rates in Chrome extensions created by AI models
Identifying security gaps between coding ability and secure program generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzed LLM-generated Chrome extensions for vulnerabilities
Built ChromeSecBench dataset with 140 vulnerable prompts
Tested nine state-of-the-art LLMs across security dimensions
🔎 Similar Papers
No similar papers found.