PyGress: Tool for Analyzing the Progression of Code Proficiency in Python OSS Projects

📅 2025-11-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the lack of standardized, longitudinal assessment of programming proficiency in open-source Python development. We propose the first automated method to adapt the Common European Framework of Reference for Languages (CEFR) to Python code proficiency evaluation. Our approach parses GitHub commit histories and leverages the pycefr static analysis engine alongside syntactic and semantic feature extraction to map code snippets automatically to CEFR levels (A1–C2). Temporal modeling and interactive web-based visualization enable dynamic tracking of individual developers’ and project-level proficiency evolution. Key contributions are: (1) establishing the first CEFR-informed paradigm for code competency assessment; and (2) releasing an open-source, reproducible end-to-end analysis pipeline and demonstration system. Empirical evaluation across major Python repositories confirms the method’s validity, interpretability, and practical utility in quantifying skill progression over time.

Technology Category

Application Category

📝 Abstract
Assessing developer proficiency in open-source software (OSS) projects is essential for understanding project dynamics, especially for expertise. This paper presents PyGress, a web-based tool designed to automatically evaluate and visualize Python code proficiency using pycefr, a Python code proficiency analyzer. By submitting a GitHub repository link, the system extracts commit histories, analyzes source code proficiency across CEFR-aligned levels (A1 to C2), and generates visual summaries of individual and project-wide proficiency. The PyGress tool visualizes per-contributor proficiency distribution and tracks project code proficiency progression over time. PyGress offers an interactive way to explore contributor coding levels in Python OSS repositories. The video demonstration of the PyGress tool can be found at https://youtu.be/hxoeK-ggcWk, and the source code of the tool is publicly available at https://github.com/MUICT-SERU/PyGress.
Problem

Research questions and friction points this paper is trying to address.

Automatically evaluates Python code proficiency in OSS projects
Tracks progression of coding skills across CEFR levels over time
Visualizes individual and project-wide Python proficiency distribution
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated Python code proficiency analysis using pycefr
Extracts GitHub commit histories for proficiency tracking
Visualizes individual and project-wide coding level progression
🔎 Similar Papers
No similar papers found.
R
Rujiphart Charatvaraphan
Faculty of Information and Communication Technology, Mahidol University, Thailand
B
Bunradar Chatchaiyadech
Faculty of Information and Communication Technology, Mahidol University, Thailand
T
Thitirat Sukijprasert
Faculty of Information and Communication Technology, Mahidol University, Thailand
Chaiyong Ragkhitwetsagul
Chaiyong Ragkhitwetsagul
Assistant Professor, Faculty of ICT, Mahidol University
Software EngineeringMining Software RepositoriesCode SimilarityEmpirical Studies
M
Morakot Choetkiertikul
Faculty of Information and Communication Technology, Mahidol University, Thailand
R
R. Kula
Graduate School of Information Science and Technology, The University of Osaka, Japan
T
T. Sunetnanta
Faculty of Information and Communication Technology, Mahidol University, Thailand
Kenichi Matsumoto
Kenichi Matsumoto
NAIST
Software Engineering