FMI_SU_Yotkova_Kastreva at SemEval-2026 Task 13: Lightweight Detection of LLM-Generated Code via Stylometric Signals

πŸ“… 2026-05-05
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

194K/year
πŸ“ Abstract
SemEval-2026 Task 13 investigates machine-generated code detection across multiple programming languages and application scenarios, asking participating systems to generalize to unseen languages and domains. This paper describes our participation in Subtask A (binary classification) and explores both pretrained code encoders and lightweight feature-based methods. We design ratio-based features that are less sensitive to snippet length. To support the extraction of descriptiveness-related signals, we use parsing engines and a programming-language classifier. Additionally, we train a separate code-vs-text line classifier to identify raw natural language segments embedded within samples. We combine a shallow decision tree with heuristic rules derived from data analysis to produce the final predictions. Our approach is computationally efficient, requires only CPU resources for training, and achieves near-instant inference time, offering a lightweight alternative to large pretrained models.
Problem

Research questions and friction points this paper is trying to address.

LLM-generated code detection
stylometric signals
code authenticity
cross-language generalization
machine-generated code
Innovation

Methods, ideas, or system contributions that make the work stand out.

stylometric signals
lightweight detection
ratio-based features
code-vs-text classification
decision tree with heuristics
πŸ”Ž Similar Papers
E
Elitsa Yotkova
Faculty of Mathematics and Informatics, Sofia University "St. Kliment Ohridski", Bulgaria
V
Violeta Kastreva
Faculty of Mathematics and Informatics, Sofia University "St. Kliment Ohridski", Bulgaria
D
Dimitar Dimitrov
Faculty of Mathematics and Informatics, Sofia University "St. Kliment Ohridski", Bulgaria
I
Ivan Koychev
Faculty of Mathematics and Informatics, Sofia University "St. Kliment Ohridski", Bulgaria
Preslav Nakov
Preslav Nakov
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)
Computational LinguisticsLarge Language ModelsFact-checkingFake News