Static Program Slicing Using Language Models With Dataflow-Aware Pretraining and Constrained Decoding

📅 2026-04-09
📈 Citations: 0
Influential: 0
📄 PDF

career value

175K/year
🤖 AI Summary
This work addresses the challenge that current language models struggle to accurately capture data dependencies in static program slicing, often generating hallucinated code. To mitigate this, the authors reformulate program slicing as a sequence-to-sequence prediction task and introduce a data-flow-aware pretraining strategy, which incorporates statement reordering based on data-flow graphs and span masking. This approach is complemented by a constrained decoding mechanism that jointly respects lexical and syntactic correctness. Evaluated on Java and Python benchmarks using compact language models such as CodeT5+, the proposed method substantially outperforms existing approaches, achieving up to a 22% improvement in ExactMatch accuracy. The results demonstrate enhanced slicing precision and effective suppression of code hallucinations.
📝 Abstract
Static program slicing is a fundamental software engineering technique for isolating code relevant to specific variables. While recent learning-based approaches using language models (LMs) show promise in automating slice prediction, they suffer from inaccurate dependency modeling and unconstrained generation, where LMs fail to capture precise data flow relations and produce slices containing hallucinated tokens and statements. To address these challenges, we propose Sliceformer, a novel approach that reformulates static program slicing as a sequence-to-sequence task using small language models such as CodeT5+. Sliceformer introduces two key innovations that directly target the identified limitations. First, to improve dependency modeling, we design dataflow-aware pretraining objectives that leverage data flow graphs (DFG) to teach models data dependencies through dataflow-preserving statement permutation and dataflow-aware span corruption. Second, to eliminate hallucination, we develop a constrained decoding mechanism that enforces both lexical and syntactic constraints. We evaluate Sliceformer on Java and Python program slicing benchmarks, demonstrating consistent improvements over state-of-the-art baselines with up to 22% gain in ExactMatch.
Problem

Research questions and friction points this paper is trying to address.

static program slicing
language models
dataflow modeling
hallucination
dependency modeling
Innovation

Methods, ideas, or system contributions that make the work stand out.

dataflow-aware pretraining
constrained decoding
static program slicing
sequence-to-sequence modeling
language models
🔎 Similar Papers
2024-02-08International Conference on Machine LearningCitations: 6