Improving Deep Assertion Generation via Fine-Tuning Retrieval-Augmented Pre-trained Language Models

📅 2025-02-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing assertion generation methods rely heavily on lexical matching and suffer from limited training data, leading to inadequate semantic understanding and low accuracy. To address these limitations, we propose RetriGen—a novel end-to-end framework that introduces a hybrid retriever jointly leveraging lexical (token-level) and semantic (embedding-level) retrieval, coupled with collaborative fine-tuning of a pre-trained language model (PLM) generator. This design enables cross-repository, semantics-aware assertion retrieval and precise generation. Evaluated on two established benchmarks, RetriGen achieves 57.66% assertion accuracy and 73.24% CodeBLEU—outperforming the strongest baseline by +50.66% and +14.14%, respectively. These results demonstrate substantial improvements in both the depth of semantic reasoning and the robustness of generated assertions.

Technology Category

Application Category

📝 Abstract
Unit testing validates the correctness of the units of the software system under test and serves as the cornerstone in improving software quality and reliability. To reduce manual efforts in writing unit tests, some techniques have been proposed to automatically generate test assertions, with recent integration-based approaches considered state-of-the-art. Despite being promising, such integration-based approaches face several limitations, including reliance on lexical matching for assertion retrieval and a limited training corpus for assertion generation. This paper proposes a novel retrieval-augmented deep assertion generation approach, namely RetriGen, based on a hybrid retriever and a pre-trained language model (PLM)-based generator. Given a focal-test, RetriGen first builds a hybrid assertion retriever to search for the most relevant Test-Assert Pair from external codebases. The retrieval process considers lexical similarity and semantical similarity via a token-based and an embedding-based retriever, respectively. RetriGen then treats assertion generation as a sequence-to-sequence task and designs a PLM-based assertion generator to predict a correct assertion. We conduct extensive experiments to evaluate RetriGen against six state-of-the-art approaches across two large-scale datasets and two metrics. The results demonstrate that RetriGen achieves 57.66% accuracy and 73.24% CodeBLEU, outperforming all baselines with average improvements of 50.66% and 14.14%, respectively.
Problem

Research questions and friction points this paper is trying to address.

Automates unit test assertion generation
Enhances retrieval with hybrid methods
Improves accuracy using pre-trained models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid retriever for assertion retrieval
Pre-trained language model generator
Combines lexical and semantical similarity
🔎 Similar Papers
No similar papers found.
Quanjun Zhang
Quanjun Zhang
Nanjing University of Science and Technology
Software EngineeringSoftware TestingAutomated Program Repair
Chunrong Fang
Chunrong Fang
Software Institute, Nanjing University
Software TestingSoftware EngineeringComputer Science
Y
Yi Zheng
State Key Laboratory for Novel Software Technology, Nanjing University, China
Y
Yaxin Zhang
State Key Laboratory for Novel Software Technology, Nanjing University, China
Yuan Zhao
Yuan Zhao
Lanzhou University of Technology
time series forecasting
Rubing Huang
Rubing Huang
Macau University of Science and Technology
AI for Software EngineeringSoftware Engineering for AISoftware TestingAI Applications
Jianyi Zhou
Jianyi Zhou
Peking University
Software Testing
Y
Yun Yang
Department of Computing Technologies, Swinburne University of Technology, Australia
T
Tao Zheng
State Key Laboratory for Novel Software Technology, Nanjing University, China
Z
Zhenyu Chen
State Key Laboratory for Novel Software Technology, Nanjing University, China and Shenzhen Research Institute of Nanjing University, China