🤖 AI Summary
To address performance degradation of large language models (LLMs) in understanding niche and proprietary API-based dialogues—caused by scarce high-quality multi-turn training data—this paper proposes a lightweight, two-stage dialogue synthesis pipeline. First, structured dialogue frameworks are generated using symbolic dialogue-act scripts and classical dialogue planners; second, high-fidelity gold-standard utterances are distilled via an o4-mini teacher model. The resulting data is used to fine-tune Llama 3.2 3B. The entire process is fully local, requires no external APIs, and features a modular, open-source–friendly design. Experiments show substantial improvements: BLEU increases by 12 percentage points (0.38 → 0.50) and BERTScore by 3 points (0.88 → 0.91). The fine-tuned model enables efficient single-GPU deployment on consumer-grade hardware. The core contribution is a low-cost, highly controllable paradigm for domain-specific dialogue data synthesis.
📝 Abstract
Large-language-model assistants are suitable for explaining popular APIs, yet they falter on niche or proprietary libraries because the multi-turn dialogue data needed for fine-tuning are scarce. We present APIDA-Chat, an open-source pipeline that converts symbolic dialogue-act "scripts" into realistic, domain-grounded API Search conversations using a lightweight model for inexpensive training data generation. Phase I pairs a legacy dialogue planner with a high-capability teacher LLM (o4-mini) to synthesize a "gold set" of realized dialogues; then, a smaller Llama 3.2 3B student model is fine-tuned on this corpus. Phase II drops the teacher and reuses the same planner with the fine-tuned model, allowing rapid, low-cost synthesis of new dialogues without exposing source code to external services. The fine-tuned student improves BLEU from 0.38 to 0.50 and BERTScore from 0.88 to 0.91 versus the base model while running entirely on a single consumer GPU. All components are modular and publicly released to serve as a conservative baseline for future work. APIDA-Chat is open-sourced at https://github.com/Zeberhart/apida-chat and a video demo is available at https://youtu.be/YqmZBHyGbPs .