Enhancing the Robustness of LLM-Generated Code: Empirical Study and Framework

📅 2025-03-26

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

Large language models (LLMs) generate code with insufficient robustness—particularly lacking input validation and error handling. Method: This paper presents the first empirical study on LLM-generated code robustness and proposes RobGen, a lightweight, training-free framework. It introduces (1) a novel token-level confidence-based robustness metric; (2) identifies an implicit cognitive deficiency wherein the “if” token is frequently under-sampled during generation; and (3) implements two complementary enhancement mechanisms: decoding-time dynamic probability adjustment (RobGen-Adj) and post-hoc conditional statement insertion (RobGen-Ins). Contribution/Results: RobGen reduces the proportion of LLM-generated code exhibiting lower robustness than human-written code by 20.0%, significantly improving cross-task reliability. All code and datasets are publicly released.

Technology Category

Application Category

📝 Abstract

Ensuring the robustness of code generated by large language models (LLMs) is crucial for real-world reliability. However, existing evaluations predominantly focus on correctness, often neglecting key robustness concerns such as missing input validation and insufficient error handling. In this paper, we present the first empirical study on the robustness of LLM-generated code. We introduce novel robustness metrics and analyze four state-of-the-art code LLMs, revealing that, on average, 43.1% of their generated code is less robust than human-written counterparts. Notably, over 90% of robustness deficiencies stem from missing conditional checks, with 70% of these omissions occurring in the first line of code. Additionally, in 69% of cases where a conditional statement is necessary but absent, the"if"token still ranks third or higher in the model's predicted token probabilities, indicating an implicit recognition of control structures. Building on these findings, we propose RobGen, a framework designed to enhance code robustness without requiring model retraining. RobGen leverages two model-agnostic techniques: RobGen-Adj, which dynamically adjusts token probabilities during decoding to encourage the inclusion of control structures, and RobGen-Ins, which improves generated code by inserting missing conditionals after generation. Experimental results demonstrate that RobGen reduces the proportion of less robust model-generated code by 20.0%, significantly enhancing code reliability across diverse tasks. As a lightweight and adaptable solution, RobGen effectively mitigates robustness challenges in LLM-generated code. All code and data are available at https://github.com/SYSUSELab/RobGen.

Problem

Research questions and friction points this paper is trying to address.

Assessing robustness of LLM-generated code versus human-written code

Addressing missing conditional checks in LLM-generated code

Proposing a framework to enhance code robustness without retraining

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces novel robustness metrics for code

Proposes RobGen framework for enhancing code robustness

Uses dynamic token adjustment and conditional insertion

🔎 Similar Papers

How Well Do Large Language Models Serve as End-to-End Secure Code Producers?