🤖 AI Summary
To address the challenge of balancing domain specificity, cost-efficiency, and API security in automated code generation using foundation models, this paper proposes a lightweight fine-tuning framework tailored for enterprise private deployment. Methodologically, it integrates Low-Rank Adaptation (LoRA) with domain-specific instruction tuning, leveraging high-quality instruction data derived from internal enterprise JavaScript projects to efficiently fine-tune open-source foundation models on proprietary infrastructure. Compared to third-party API reliance or full-parameter fine-tuning, our approach reduces GPU memory consumption and training cost by 62% (measured in GPU-hours), while improving generated code’s domain relevance, syntactic correctness, and business alignment by an average of 19.3%. The core contribution lies in the first deep integration of LoRA with vertical-domain instruction tuning—establishing a secure, low-cost, and scalable paradigm for private code generation.
📝 Abstract
Context: Automated code generation using Foundation Models (FMs) offers promising solutions for enhancing software development efficiency. However, challenges remain in ensuring domain specificity, cost-effectiveness, and security - especially when relying on third-party APIs. This paper introduces CodeLSI, a framework that combines low-rank optimization and domain-specific instruction tuning to address these challenges.
Objectives: The aim of this study is to develop and evaluate CodeLSI, a novel approach for generating high-quality code tailored to specific domains, using FMs fine-tuned on company infrastructure without dependence on external APIs.
Methods: CodeLSI applies low-rank adaptation techniques to reduce the computational cost of model pre-training and fine-tuning. Domain-specific instruction tuning is employed to align code generation with organizational needs. We implemented and tested the framework on real-world JavaScript coding tasks using datasets drawn from internal software projects.
Results: Experimental evaluations show that CodeLSI produces high-quality, context aware code. It outperforms baseline models in terms of relevance, accuracy, and domain fit. The use of low-rank optimization significantly reduced resource requirements, enabling scalable training on company-owned infrastructure.
Conclusion: CodeLSI demonstrates that combining low-rank optimization with domain specific tuning can enhance the practicality and performance of FMs for automated code generation. This approach provides a secure, cost-efficient alternative to commercial API based solutions and supports faster, more targeted innovation in software development.