Go-UT-Bench: A Fine-Tuning Dataset for LLM-Based Unit Test Generation in Go

📅 2025-11-14

📈 Citations: 0

✨ Influential: 0

career value

150K/year

🤖 AI Summary

Large language models (LLMs) for code underperform on practical software engineering tasks such as unit test generation, primarily due to training data skew toward high-frequency tasks like code completion and severe scarcity of high-quality, task-aligned data—especially for low-resource languages like Go. Method: We introduce GoTestBench, the first dedicated benchmark for Go unit test generation, comprising 5,264 real-world code–test pairs from 10 open-source projects. Leveraging this benchmark, we systematically evaluate and fine-tune two architectures—mixture-of-experts and dense decoders—on the unit test generation task. Contribution/Results: Our work is the first to simultaneously address the evaluation and data gaps for unit test generation in low-resource programming languages. Fine-tuned models achieve significant improvements over base models on over 75% of test cases. By mitigating data imbalance and enhancing task alignment, our approach substantially improves model practicality and generalization in real-world development scenarios.

Technology Category

Application Category

📝 Abstract

Training data imbalance poses a major challenge for code LLMs. Most available data heavily over represents raw opensource code while underrepresenting broader software engineering tasks, especially in low resource languages like Golang. As a result, models excel at code autocompletion but struggle with real world developer workflows such as unit test generation. To address this gap, we introduce GO UT Bench, a benchmark dataset of 5264 pairs of code and unit tests, drawn from 10 permissively licensed Golang repositories spanning diverse domain. We evaluate its effectiveness as a fine tuning dataset across two LLM families i.e. mixture of experts and dense decoders. Our results show that finetuned models outperform their base counterparts on more than 75% of benchmark tasks.

Problem

Research questions and friction points this paper is trying to address.

Addresses training data imbalance in code LLMs for low-resource languages

Improves unit test generation capability in Golang programming language

Provides benchmark dataset to enhance real-world developer workflows

Innovation

Methods, ideas, or system contributions that make the work stand out.

Created Go unit test dataset for fine-tuning

Evaluated mixture-of-experts and dense decoder models

Fine-tuned models outperformed base versions significantly

🔎 Similar Papers

Large-scale, Independent and Comprehensive study of the power of LLMs for test case generation