🤖 AI Summary
This study addresses the lack of standardized, longitudinal assessment of programming proficiency in open-source Python development. We propose the first automated method to adapt the Common European Framework of Reference for Languages (CEFR) to Python code proficiency evaluation. Our approach parses GitHub commit histories and leverages the pycefr static analysis engine alongside syntactic and semantic feature extraction to map code snippets automatically to CEFR levels (A1–C2). Temporal modeling and interactive web-based visualization enable dynamic tracking of individual developers’ and project-level proficiency evolution. Key contributions are: (1) establishing the first CEFR-informed paradigm for code competency assessment; and (2) releasing an open-source, reproducible end-to-end analysis pipeline and demonstration system. Empirical evaluation across major Python repositories confirms the method’s validity, interpretability, and practical utility in quantifying skill progression over time.
📝 Abstract
Assessing developer proficiency in open-source software (OSS) projects is essential for understanding project dynamics, especially for expertise. This paper presents PyGress, a web-based tool designed to automatically evaluate and visualize Python code proficiency using pycefr, a Python code proficiency analyzer. By submitting a GitHub repository link, the system extracts commit histories, analyzes source code proficiency across CEFR-aligned levels (A1 to C2), and generates visual summaries of individual and project-wide proficiency. The PyGress tool visualizes per-contributor proficiency distribution and tracks project code proficiency progression over time. PyGress offers an interactive way to explore contributor coding levels in Python OSS repositories. The video demonstration of the PyGress tool can be found at https://youtu.be/hxoeK-ggcWk, and the source code of the tool is publicly available at https://github.com/MUICT-SERU/PyGress.