🤖 AI Summary
This survey addresses the central challenge of leveraging deep learning for code understanding, generation, and optimization—termed Neural Code Intelligence (NCI). Methodologically, it integrates program analysis, natural language processing, deep learning, and large language model techniques, covering sequence modeling, pretraining-finetuning paradigms, multi-task evaluation, and semantic representation. Based on a systematic review of 680+ papers and 50+ representative models across 20+ task categories, the work establishes the first comprehensive historical taxonomy of NCI—from RNNs to modern LLMs—structured into four evolutionary paradigm stages. It further uncovers synergistic mechanisms and cross-domain integration pathways between code intelligence and general-purpose machine intelligence. The contributions include an authoritative, continuously updated resource repository (hosted on GitHub), a unified conceptual framework clarifying persistent technical bottlenecks (e.g., semantic fidelity, compositional generalization), and a pragmatic technology roadmap guiding both academic research and industrial deployment.
📝 Abstract
Neural Code Intelligence -- leveraging deep learning to understand, generate, and optimize code -- holds immense potential for transformative impacts on the whole society. Bridging the gap between Natural Language and Programming Language, this domain has drawn significant attention from researchers in both research communities over the past few years. This survey presents a systematic and chronological review of the advancements in code intelligence, encompassing over 50 representative models and their variants, more than 20 categories of tasks, and an extensive coverage of over 680 related works. We follow the historical progression to trace the paradigm shifts across different research phases (e.g., from modeling code with recurrent neural networks to the era of Large Language Models). Concurrently, we highlight the major technical transitions in models, tasks, and evaluations spanning through different stages. For applications, we also observe a co-evolving shift. It spans from initial endeavors to tackling specific scenarios, through exploring a diverse array of tasks during its rapid expansion, to currently focusing on tackling increasingly complex and varied real-world challenges. Building on our examination of the developmental trajectories, we further investigate the emerging synergies between code intelligence and broader machine intelligence, uncovering new cross-domain opportunities and illustrating the substantial influence of code intelligence across various domains. Finally, we delve into both the opportunities and challenges associated with this field, alongside elucidating our insights on the most promising research directions. An ongoing, dynamically updated project and resources associated with this survey have been released at https://github.com/QiushiSun/Awesome-Code-Intelligence.