SoK: Understanding (New) Security Issues Across AI4Code Use Cases

📅 2025-12-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
AI-for-Code (AI4Code) systems face severe security risks due to benchmarking biases toward Python and toy problems, the absence of standardized security benchmarks, data leakage in evaluation protocols, and poor adversarial robustness—leading to unsafe code generation, inaccurate vulnerability detection, and semantically flawed translation. This work introduces, for the first time, a “security-first” AI4Code paradigm. We systematically identify shared security deficiencies across three core tasks: code generation, vulnerability detection, and code translation. We propose a fine-grained security evaluation framework, conduct cross-model comparative experiments, and design semantic-preserving adversarial attacks. Our technical contributions comprise: (1) default-safe code generation, (2) robust vulnerability detection benchmark construction, and (3) security-enhanced code translation. Empirical results reveal pervasive security degradation across mainstream models. The study establishes the first standardized AI4Code security evaluation suite and a practical defense guideline.

Technology Category

Application Category

📝 Abstract
AI-for-Code (AI4Code) systems are reshaping software engineering, with tools like GitHub Copilot accelerating code generation, translation, and vulnerability detection. Alongside these advances, however, security risks remain pervasive: insecure outputs, biased benchmarks, and susceptibility to adversarial manipulation undermine their reliability. This SoK surveys the landscape of AI4Code security across three core applications, identifying recurring gaps: benchmark dominance by Python and toy problems, lack of standardized security datasets, data leakage in evaluation, and fragile adversarial robustness. A comparative study of six state-of-the-art models illustrates these challenges: insecure patterns persist in code generation, vulnerability detection is brittle to semantic-preserving attacks, fine-tuning often misaligns security objectives, and code translation yields uneven security benefits. From this analysis, we distill three forward paths: embedding secure-by-default practices in code generation, building robust and comprehensive detection benchmarks, and leveraging translation as a route to security-enhanced languages. We call for a shift toward security-first AI4Code, where vulnerability mitigation and robustness are embedded throughout the development life cycle.
Problem

Research questions and friction points this paper is trying to address.

Addresses security risks in AI4Code systems like insecure outputs and adversarial manipulation
Identifies gaps in benchmarks, datasets, and robustness across code generation and vulnerability detection
Proposes secure-by-default practices and robust benchmarks to embed security in AI4Code development
Innovation

Methods, ideas, or system contributions that make the work stand out.

Embed secure-by-default practices in code generation
Build robust comprehensive detection benchmarks
Leverage translation for security-enhanced languages
🔎 Similar Papers
No similar papers found.