SCLA: Automated Smart Contract Summarization via LLMs and Control Flow Prompt

📅 2024-02-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing large language models (LLMs) for smart contract code summarization neglect control-flow structure, leading to semantic loss. To address this, we propose CFG-AST Fusion Prompting: a method that jointly encodes control-flow graph (CFG) and abstract syntax tree (AST) semantic nodes into structured prompts, explicitly modeling hierarchical control logic and implicit semantic dependencies in contracts. Our approach comprises AST-based semantic extraction, CFG construction, LLM prompt engineering, and multi-dimensional automated evaluation (BLEU-4, METEOR, ROUGE-L, BLEURT). Evaluated on a dataset of 40,000 real-world smart contracts, our method outperforms state-of-the-art baselines across all metrics—achieving relative improvements of 26.7%, 23.2%, 16.7%, and 14.7% in BLEU-4, METEOR, ROUGE-L, and BLEURT, respectively. These gains significantly enhance summary maintainability and support for vulnerability prevention.

Technology Category

Application Category

📝 Abstract
Smart contract code summarization is crucial for efficient maintenance and vulnerability mitigation. While many studies use Large Language Models (LLMs) for summarization, their performance still falls short compared to fine-tuned models like CodeT5+ and CodeBERT. Some approaches combine LLMs with data flow analysis but fail to fully capture the hierarchy and control structures of the code, leading to information loss and degraded summarization quality. We propose SCLA, an LLM-based method that enhances summarization by integrating a Control Flow Graph (CFG) and semantic facts from the code's control flow into a semantically enriched prompt. SCLA uses a control flow extraction algorithm to derive control flows from semantic nodes in the Abstract Syntax Tree (AST) and constructs the corresponding CFG. Code semantic facts refer to both explicit and implicit information within the AST that is relevant to smart contracts. This method enables LLMs to better capture the structural and contextual dependencies of the code. We validate the effectiveness of SCLA through comprehensive experiments on a dataset of 40,000 real-world smart contracts. The experiment shows that SCLA significantly improves summarization quality, outperforming the SOTA baselines with improvements of 26.7%, 23.2%, 16.7%, and 14.7% in BLEU-4, METEOR, ROUGE-L, and BLEURT scores, respectively.
Problem

Research questions and friction points this paper is trying to address.

Enhance smart contract summarization using LLMs and control flow analysis.
Address information loss by integrating CFG and semantic facts into prompts.
Improve summarization quality over SOTA models with significant metric gains.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates Control Flow Graph with LLMs
Uses AST for semantic fact extraction
Enhances summarization quality significantly
🔎 Similar Papers
No similar papers found.
X
Xiaoqi Li
Hainan University, Haikou, China
Y
Yingjie Mao
Hainan University, Haikou, China
Zexin Lu
Zexin Lu
Sichuan University
W
Wenkai Li
Hainan University, Haikou, China
Z
Zongwei Li
Hainan University, Haikou, China