🤖 AI Summary
General-purpose large language models (LLMs) exhibit weak domain adaptability, low reasoning accuracy, and inefficient, verbose in-context learning when applied to specialized IT operations micro-domains—e.g., Hitachi JP1 middleware.
Method: This paper proposes Agent Fine-tuning, a novel adaptation framework tailored for technical micro-domains. It integrates domain-specific manuals to construct high-quality training data, employs LLM-generated reasoning trajectories for knowledge distillation, and introduces a context-answer extractor to enhance relevance. Furthermore, it synergistically combines retrieval-augmented generation (RAG) with structured prompt engineering to optimize decision-making.
Contribution/Results: Evaluated on the JP1 certification exam task, the method achieves a 14% absolute accuracy improvement over baseline models. It significantly enhances both reasoning precision and search efficiency in complex, narrow technical domains—demonstrating superior domain specialization without requiring architectural modifications or extensive retraining.
📝 Abstract
Agentic large language models (LLMs) have become prominent for autonomously interacting with external environments and performing multi-step reasoning tasks. Most approaches leverage these capabilities via in-context learning with few-shot prompts, but this often results in lengthy inputs and higher computational costs. Agent fine-tuning offers an alternative by enabling LLMs to internalize procedural reasoning and domain-specific knowledge through training on relevant data and demonstration trajectories. While prior studies have focused on general domains, their effectiveness in specialized technical microdomains remains unclear. This paper explores agent fine-tuning for domain adaptation within Hitachi's JP1 middleware, a microdomain for specialized IT operations. We fine-tuned LLMs using JP1-specific datasets derived from domain manuals and distilled reasoning trajectories generated by LLMs themselves, enhancing decision making accuracy and search efficiency. During inference, we used an agentic prompt with retrieval-augmented generation and introduced a context-answer extractor to improve information relevance. On JP1 certification exam questions, our method achieved a 14% performance improvement over the base model, demonstrating the potential of agent fine-tuning for domain-specific reasoning in complex microdomains.