CodeBC: A More Secure Large Language Model for Smart Contract Code Generation in Blockchain

📅 2025-04-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the weak security awareness of large language models (LLMs) in generating smart contracts for low-resource languages like Solidity—and the scarcity of large-scale, human-annotated vulnerability data—this paper proposes a three-stage security-aligned fine-tuning framework that requires no manual vulnerability localization annotations. Our method innovatively leverages fine-grained security labels (rather than pairwise vulnerability annotations) to guide supervised fine-tuning, security-aware contrastive learning, and safety-enhanced reasoning. It further integrates Solidity syntax constraints with patterns of common vulnerabilities. Built upon CodeLlama, our approach achieves significant improvements across multiple smart contract benchmarks: higher BLEU and CodeBLEU scores, increased compilation success rate, and a 42.3% reduction in generated vulnerabilities. To our knowledge, this is the first end-to-end secure code generation method for Solidity that operates without fine-grained human annotations—demonstrating both methodological novelty and practical deployability.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) excel at generating code from natural language instructions, yet they often lack an understanding of security vulnerabilities. This limitation makes it difficult for LLMs to avoid security risks in generated code, particularly in high-security programming tasks such as smart contract development for blockchain. Researchers have attempted to enhance the vulnerability awareness of these models by training them to differentiate between vulnerable and fixed code snippets. However, this approach relies heavily on manually labeled vulnerability data, which is only available for popular languages like Python and C++. For low-resource languages like Solidity, used in smart contracts, large-scale annotated datasets are scarce and difficult to obtain. To address this challenge, we introduce CodeBC, a code generation model specifically designed for generating secure smart contracts in blockchain. CodeBC employs a three-stage fine-tuning approach based on CodeLlama, distinguishing itself from previous methods by not relying on pairwise vulnerability location annotations. Instead, it leverages vulnerability and security tags to teach the model the differences between vulnerable and secure code. During the inference phase, the model leverages security tags to generate secure and robust code. Experimental results demonstrate that CodeBC outperforms baseline models in terms of BLEU, CodeBLEU, and compilation pass rates, while significantly reducing vulnerability rates. These findings validate the effectiveness and cost-efficiency of our three-stage fine-tuning strategy, making CodeBC a promising solution for generating secure smart contract code.
Problem

Research questions and friction points this paper is trying to address.

Enhancing LLMs to generate secure smart contract code
Reducing reliance on manually labeled vulnerability data
Improving security in low-resource languages like Solidity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Three-stage fine-tuning without vulnerability annotations
Leveraging security tags for secure code generation
Enhanced performance in BLEU and CodeBLEU metrics
🔎 Similar Papers
No similar papers found.
Lingxiang Wang
Lingxiang Wang
Beihang university
NLP
Hainan Zhang
Hainan Zhang
Beihang University
Dialogue GenerationText GenerationFederated LearningNatural Language Processing
Q
Qinnan Zhang
School of Artificial Intelligence, Beihang University, Beijing, 100190, China; Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, Beijing, 100190, China
Z
Ziwei Wang
State Key Laboratory of Complex & Critical Software Environment, Beihang University, Beijing, 100190, China; School of Artificial Intelligence, Beihang University, Beijing, 100190, China; Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, Beijing, 100190, China
Hongwei Zheng
Hongwei Zheng
Shanghai Jiao Tong University
计算机视觉、联邦学习
J
Jin Dong
Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, Beijing, 100190, China; Beijing Academy of Blockchain and Edge Computing, BABEC, Beijing, 100086, China
Z
Zhiming Zheng
State Key Laboratory of Complex & Critical Software Environment, Beihang University, Beijing, 100190, China; School of Artificial Intelligence, Beihang University, Beijing, 100190, China; Beijing Advanced Innovation Center for Future Blockchain and Privacy Computing, Beijing, 100190, China