How Natural Language Proficiency Shapes GenAI Code for Software Engineering Tasks

📅 2025-11-06

🏛️ IEEE Software

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

It remains unclear whether natural language proficiency—distinct from prompt engineering techniques—independently influences code generation quality in large language models (LLMs). Method: We systematically controlled the English proficiency level of prompts for 164 programming tasks from the HumanEval benchmark according to the CEFR framework (A2–C2), and evaluated generated code correctness and professionalism across multiple foundation models. Contribution/Results: We find that LLMs exhibit a default preference for B2-level prompts; increasing prompt proficiency consistently improves pass rates across models (+3.2–8.7%), irrespective of specific prompt engineering strategies. This is the first empirical demonstration that natural language proficiency itself constitutes a key, controllable variable for modulating code generation quality. Our findings provide both empirical grounding and actionable guidance for designing language interfaces in AI-powered programming tools.

Technology Category

Application Category

📝 Abstract

With the widespread adoption of Foundation Model (FM)-powered tools in software engineering, the natural language prompt has become a critical interface between developers and Large Language Models (LLMs). While much research has focused on prompt structure, the natural language proficiency is an underexplored factor that can influence the quality of generated code. This paper investigates whether the English language proficiency itself independent of the prompting technique affects the proficiency and correctness of code generated by LLMs. Using the HumanEval dataset, we systematically varied the English proficiency of prompts from basic to advanced for 164 programming tasks and measured the resulting code proficiency and correctness. Our findings show that LLMs default to an intermediate (B2) natural language level. While the effect on the resulting code proficiency was model-dependent, we found that higher-proficiency prompts consistently yielded more correct code across all models. These results demonstrate that natural language proficiency is a key lever for controlling code generation, helping developers tailor AI output and improve the reliability of solutions.

Problem

Research questions and friction points this paper is trying to address.

Investigating how English proficiency affects code quality from LLMs

Measuring natural language prompt impact on code correctness

Demonstrating language proficiency as key factor for reliable AI coding

Innovation

Methods, ideas, or system contributions that make the work stand out.

Natural language proficiency affects code correctness

Higher-proficiency prompts yield more correct code

LLMs default to intermediate natural language level

🔎 Similar Papers

No similar papers found.