Exploring the Power of Diffusion Large Language Models for Software Engineering: An Empirical Investigation

📅 2025-10-06

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Autoregressive large language models (AR-LLMs) exhibit limitations in modeling code structure, suffer from high inference latency, and underperform on cross-file software engineering tasks. Method: This work presents the first systematic evaluation of diffusion large language models (DLLMs) across the full software engineering lifecycle—code generation, defect detection, and program repair—leveraging their global bidirectional encoding and decoupled generation mechanism to enhance syntactic and semantic understanding of code. Contribution/Results: Evaluated on a large-scale benchmark comprising 52,937 tasks, a 7B-parameter DLLM achieves a 30% average accuracy gain over AR-LLMs and improves cross-file repair performance by 113%, while reducing inference latency. This study establishes the first empirically grounded, comprehensive evaluation framework for DLLMs in software engineering, demonstrating their viability as a promising alternative paradigm to AR-LLMs.

Technology Category

Application Category

📝 Abstract

Autoregressive Large Language Models (AR-LLMs) are widely used in software engineering (SE) but face limitations in processing code structure information and suffer from high inference latency. Diffusion LLMs (DLLMs) offer a promising alternative with global bidirectional encoding and decoupled generation steps. This work presents the first comprehensive evaluation of DLLMs across the software development lifecycle, including code generation, defect detection, and program repair. On a large-scale benchmark of 52,937 tasks, 7Bparameter DLLMs outperform AR-LLMs with a 30% average accuracy improvement achieving a 113% gain on cross-file repair, while maintaining superior efficiency and reduced latency. Our results establish DLLMs as a superior paradigm for SE tasks.

Problem

Research questions and friction points this paper is trying to address.

Evaluating diffusion models' effectiveness in software engineering tasks

Overcoming autoregressive models' structural and latency limitations

Assessing performance across code generation and defect detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion LLMs enable bidirectional code encoding

Decoupled generation reduces inference latency

Outperform autoregressive models in software tasks

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

Software Engineer - GenAI inference

Databricks

$142,200—$204,600 USD

San Francisco, California

Authors to Follow