Beyond LLMs: An Exploration of Small Open-source Language Models in Logging Statement Generation

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address inconsistent manual logging quality, privacy concerns, and high computational overhead of large language models (LLMs), this work systematically investigates the feasibility of small open-language models (SOLMs, 1B–14B parameters) for high-quality log statement generation. We propose a synergistic fine-tuning framework integrating retrieval-augmented generation (RAG) and low-rank adaptation (LoRA), the first to empirically validate SOLMs’ competitiveness in log generation. Our instruction-tuned Qwen2.5-Coder-14B model outperforms existing tools and LLM baselines on both log location prediction and log statement generation, achieving state-of-the-art (SOTA) results under conventional metrics and LLM-as-a-judge evaluation. Cross-repository generalization tests further demonstrate strong transferability. The core contribution is establishing SOLMs as a new paradigm for log generation—balancing performance, data privacy, and inference efficiency.

Technology Category

Application Category

📝 Abstract
Effective software maintenance heavily relies on high-quality logging statements, but manual logging is challenging, error-prone, and insufficiently standardized, often leading to inconsistent log quality. While large language models have shown promise in automatic logging, they introduce concerns regarding privacy, resource intensity, and adaptability to specific enterprise needs. To tackle these limitations, this paper empirically investigates whether Small Open-source Language Models (SOLMs) could become a viable alternative via proper exploitation. Specifically, we conduct a large-scale empirical study on four prominent SOLMs, systematically evaluating the impacts of various interaction strategies, parameter-efficient fine-tuning techniques, model sizes, and model types in automatic logging. Our key findings reveal that Retrieval-Augmented Generation significantly enhances performance, and LoRA is a highly effective PEFT technique. While larger SOLMs tend to perform better, this involves a trade-off with computational resources, and instruct-tuned SOLMs generally surpass their base counterparts. Notably, fine-tuned SOLMs, particularly Qwen2.5-coder-14B, outperformed existing specialized tools and LLM baselines in accurately predicting logging locations and generating high-quality statements, a conclusion supported by traditional evaluation metrics and LLM-as-a-judge evaluations. Furthermore, SOLMs also demonstrated robust generalization across diverse, unseen code repositories.
Problem

Research questions and friction points this paper is trying to address.

Exploring SOLMs as alternatives to LLMs for logging generation
Evaluating SOLMs' performance in automatic logging statement creation
Assessing SOLMs' generalization across diverse code repositories
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval-Augmented Generation boosts logging performance
LoRA is effective for parameter-efficient fine-tuning
Instruct-tuned SOLMs outperform base models
🔎 Similar Papers
No similar papers found.