Modeling Code: Is Text All You Need?

📅 2025-07-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Purely sequential text-based modeling struggles to capture program control and data flow semantics. Method: We propose a text-graph joint modeling framework that integrates graph neural network modules into a Transformer backbone to explicitly incorporate structured program representations—such as abstract syntax trees and control flow graphs—alongside token sequences. This design synergistically combines the strong generative capacity of large language models with the fine-grained semantic modeling capability of graph-based methods. Results: Our model achieves state-of-the-art or near-state-of-the-art performance on code generation, cross-lingual code translation, and code summarization across multiple benchmarks, while maintaining scalability. The core contribution is a lightweight, plug-and-play architecture that enables efficient co-processing of textual sequences and multi-granularity program graphs within a unified framework—the first such approach—demonstrating the critical role of structural priors in enhancing generalization and robustness for code intelligence tasks.

Technology Category

Application Category

📝 Abstract
Code LLMs have become extremely popular recently for modeling source code across a variety of tasks, such as generation, translation, and summarization. However, transformer-based models are limited in their capabilities to reason through structured, analytical properties of code, such as control and data flow. Previous work has explored the modeling of these properties with structured data and graph neural networks. However, these approaches lack the generative capabilities and scale of modern LLMs. In this work, we introduce a novel approach to combine the strengths of modeling both code as text and more structured forms.
Problem

Research questions and friction points this paper is trying to address.

Code LLMs lack structured reasoning for analytical properties
Existing approaches miss generative capabilities of modern LLMs
Combining text and structured code modeling improves performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining text and structured code modeling
Enhancing transformer models with structured analysis
Integrating graph neural networks with LLMs
🔎 Similar Papers
No similar papers found.