Complexity-Based Code Embeddings

📅 2026-01-01

🏛️ International Conference on Computational Collective Intelligence

📈 Citations: 2

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This study addresses the challenge of effectively transforming source code into numerical embeddings suitable for machine learning. To this end, the authors propose r-Complexity, a novel code embedding method grounded in program dynamic behavior and a generalized complexity function. The approach generates representations by analyzing runtime execution traces across diverse inputs, eschewing reliance on syntactic code structure and thereby achieving strong algorithmic generalization. Evaluated on a real-world, multi-label dataset comprising 11 categories of solutions from the Codeforces platform, the proposed embeddings—when paired with an XGBoost classifier—demonstrate significant performance gains, attaining high average F1 scores. These results substantiate the effectiveness and practical utility of r-Complexity for semantic modeling of source code.

Technology Category

Application Category

📝 Abstract

This paper presents a generic method for transforming the source code of various algorithms to numerical embeddings, by dynamically analysing the behaviour of computer programs against different inputs and by tailoring multiple generic complexity functions for the analysed metrics. The used algorithms embeddings are based on r-Complexity . Using the proposed code embeddings, we present an implementation of the XGBoost algorithm that achieves an average F1-score on a multi-label dataset with 11 classes, built using real-world code snippets submitted for programming competitions on the Codeforces platform.

Problem

Research questions and friction points this paper is trying to address.

code embeddings

source code representation

program complexity

numerical representation

multi-label classification

Innovation

Methods, ideas, or system contributions that make the work stand out.

code embeddings

r-Complexity

dynamic program analysis