When is Generated Code Difficult to Comprehend? Assessing AI Agent Python Code Proficiency in the Wild

📅 2026-03-31

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This study addresses the lack of understanding regarding the cognitive complexity and required developer proficiency for AI-generated Python code in real-world development contexts. Leveraging the AIDev dataset, the authors conduct the first systematic evaluation of linguistic proficiency in AI-generated code within actual pull requests from open-source projects, mapping 591 code snippets to CEFR levels (A1–C2) using the static analysis tool pycefr. The findings reveal that over 90% of AI-generated code falls into the A1/A2 beginner categories, with fewer than 1% reaching the C2 expert level. Notably, the proficiency distribution closely mirrors that of human-written code, and higher-proficiency AI-generated code predominantly arises from feature implementation and bug-fixing tasks. This work empirically demonstrates a link between task type and code complexity, offering foundational insights into the maintainability of AI-generated code.

Technology Category

Application Category

📝 Abstract

The rapid adoption of AI coding agents is fundamentally shifting software developers' roles from code authors to code reviewers. While developers spend a significant portion of their time reading and comprehending code, the linguistic proficiency and complexity of the Python code generated by these agents remain largely unexplored. This study investigates the code proficiency of AI agents to determine the skill level required for developers to maintain their code. Leveraging the AIDev dataset, we mined 591 pull requests containing 5,027 Python files generated by three distinct AI agents and employed pycefr, a static analysis tool that maps Python constructs to six proficiency levels, ranging from A1 (Basic) to C2 (Mastery), to analyze the code. Our results reveal that: AI agents predominantly generate Basic-level code, with over 90% of constructs falling into the A1 and A2 categories, and less than 1% classified as Mastery (C2); AI agents' and humans' pull requests share a broadly similar proficiency profile; High-proficiency code by AI agents are from feature addition and bug fixing tasks. These findings suggest that while AI-generated code is generally accessible to developers with basic Python skills, specific tasks may require advanced proficiency to review and maintain complex, agent-generated constructs.

Problem

Research questions and friction points this paper is trying to address.

AI coding agents

code comprehension

code proficiency

Python code

software maintenance

Innovation

Methods, ideas, or system contributions that make the work stand out.

AI code comprehension

code proficiency assessment

pycefr