Go-UT-Bench: A Fine-Tuning Dataset for LLM-Based Unit Test Generation in Go

📅 2025-11-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) for code underperform on practical software engineering tasks such as unit test generation, primarily due to training data skew toward high-frequency tasks like code completion and severe scarcity of high-quality, task-aligned data—especially for low-resource languages like Go. Method: We introduce GoTestBench, the first dedicated benchmark for Go unit test generation, comprising 5,264 real-world code–test pairs from 10 open-source projects. Leveraging this benchmark, we systematically evaluate and fine-tune two architectures—mixture-of-experts and dense decoders—on the unit test generation task. Contribution/Results: Our work is the first to simultaneously address the evaluation and data gaps for unit test generation in low-resource programming languages. Fine-tuned models achieve significant improvements over base models on over 75% of test cases. By mitigating data imbalance and enhancing task alignment, our approach substantially improves model practicality and generalization in real-world development scenarios.

Technology Category

Application Category

📝 Abstract
Training data imbalance poses a major challenge for code LLMs. Most available data heavily over represents raw opensource code while underrepresenting broader software engineering tasks, especially in low resource languages like Golang. As a result, models excel at code autocompletion but struggle with real world developer workflows such as unit test generation. To address this gap, we introduce GO UT Bench, a benchmark dataset of 5264 pairs of code and unit tests, drawn from 10 permissively licensed Golang repositories spanning diverse domain. We evaluate its effectiveness as a fine tuning dataset across two LLM families i.e. mixture of experts and dense decoders. Our results show that finetuned models outperform their base counterparts on more than 75% of benchmark tasks.
Problem

Research questions and friction points this paper is trying to address.

Addresses training data imbalance in code LLMs for low-resource languages
Improves unit test generation capability in Golang programming language
Provides benchmark dataset to enhance real-world developer workflows
Innovation

Methods, ideas, or system contributions that make the work stand out.

Created Go unit test dataset for fine-tuning
Evaluated mixture-of-experts and dense decoder models
Fine-tuned models outperformed base versions significantly
🔎 Similar Papers
No similar papers found.
Y
Yashshi Pipalani
Nutanix
H
Hritik Raj
Nutanix
R
Rajat Ghosh
Nutanix
Vaishnavi Bhargava
Vaishnavi Bhargava
University of Wisconsin Madison || Birla Institute of Technology & Science, Pilani, India
LLMFairnessLIMENLPDeep learning
D
Debojyoti Dutta
Nutanix