🤖 AI Summary
This study addresses the challenge of effectively transforming source code into numerical embeddings suitable for machine learning. To this end, the authors propose r-Complexity, a novel code embedding method grounded in program dynamic behavior and a generalized complexity function. The approach generates representations by analyzing runtime execution traces across diverse inputs, eschewing reliance on syntactic code structure and thereby achieving strong algorithmic generalization. Evaluated on a real-world, multi-label dataset comprising 11 categories of solutions from the Codeforces platform, the proposed embeddings—when paired with an XGBoost classifier—demonstrate significant performance gains, attaining high average F1 scores. These results substantiate the effectiveness and practical utility of r-Complexity for semantic modeling of source code.
📝 Abstract
This paper presents a generic method for transforming the source code of various algorithms to numerical embeddings, by dynamically analysing the behaviour of computer programs against different inputs and by tailoring multiple generic complexity functions for the analysed metrics. The used algorithms embeddings are based on r-Complexity . Using the proposed code embeddings, we present an implementation of the XGBoost algorithm that achieves an average F1-score on a multi-label dataset with 11 classes, built using real-world code snippets submitted for programming competitions on the Codeforces platform.