🤖 AI Summary
This work addresses the positional inductive bias inherent in code embeddings generated by large language models (LLMs), which hinders their ability to effectively capture long-range, order-sensitive dependencies in source code. To mitigate this limitation, the paper proposes a novel hybrid LLM–RNN framework that leverages GRU or BiGRU networks to sequentially refine embeddings produced by pre-trained LLMs—such as RoBERTa, CodeBERT, and CodeT5—thereby enhancing sequential awareness and semantic modeling. The approach substantially alleviates positional bias, yielding consistent and statistically significant performance gains across three real-world datasets. Notably, on the defect detection task, the CodeT5-GRU variant achieves an accuracy of 67.90%, surpassing baseline models by over 5 percentage points.
📝 Abstract
Contextual embeddings generated by LLMs exhibit strong positional inductive biases, which can limit their ability to fully capture long-range, order-sensitive dependencies in highly structured source code. Consequently, how to further refine and enhance LLM embeddings for improved code understanding remains an open research question. To address this gap, we propose a hybrid LLM-RNN framework that reinforces LLM-generated contextual embeddings with a sequential RNN architecture. The embeddings reprocessing step aims to reinforce sequential semantics and strengthen order-aware dependencies inherent in source code. We evaluate the proposed hybrid models on both benchmark and real-world coding datasets. The experimental results show that the RoBERTa-BiGRU and CodeBERT-GRU models achieved accuracies of 66.40% and 66.03%, respectively, on the defect detection benchmark dataset, representing improvements of approximately 5.35% and 3.95% over the standalone RoBERTa and CodeBERT models. Furthermore, the CodeT5-GRU and CodeT5+-BiGRU models achieved accuracies of 67.90% and 67.79%, respectively, surpassing their base models and outperforming RoBERTa-BiGRU and CodeBERT-GRU by a notable margin. In addition, CodeT5-GRU model attains weighted and macro F1-scores of 67.18% and 67.00%, respectively, on the same dataset. Extensive experiments across three real-world datasets further demonstrate consistent and statistically significant improvements over standalone LLMs. Overall, our findings indicate that reprocessing contextual embeddings with RNN architectures enhances code understanding performance in LLM-based models.