APIDA-Chat: Structured Synthesis of API Search Dialogues to Bootstrap Conversational Agents

📅 2025-10-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address performance degradation of large language models (LLMs) in understanding niche and proprietary API-based dialogues—caused by scarce high-quality multi-turn training data—this paper proposes a lightweight, two-stage dialogue synthesis pipeline. First, structured dialogue frameworks are generated using symbolic dialogue-act scripts and classical dialogue planners; second, high-fidelity gold-standard utterances are distilled via an o4-mini teacher model. The resulting data is used to fine-tune Llama 3.2 3B. The entire process is fully local, requires no external APIs, and features a modular, open-source–friendly design. Experiments show substantial improvements: BLEU increases by 12 percentage points (0.38 → 0.50) and BERTScore by 3 points (0.88 → 0.91). The fine-tuned model enables efficient single-GPU deployment on consumer-grade hardware. The core contribution is a low-cost, highly controllable paradigm for domain-specific dialogue data synthesis.

Technology Category

Application Category

📝 Abstract
Large-language-model assistants are suitable for explaining popular APIs, yet they falter on niche or proprietary libraries because the multi-turn dialogue data needed for fine-tuning are scarce. We present APIDA-Chat, an open-source pipeline that converts symbolic dialogue-act "scripts" into realistic, domain-grounded API Search conversations using a lightweight model for inexpensive training data generation. Phase I pairs a legacy dialogue planner with a high-capability teacher LLM (o4-mini) to synthesize a "gold set" of realized dialogues; then, a smaller Llama 3.2 3B student model is fine-tuned on this corpus. Phase II drops the teacher and reuses the same planner with the fine-tuned model, allowing rapid, low-cost synthesis of new dialogues without exposing source code to external services. The fine-tuned student improves BLEU from 0.38 to 0.50 and BERTScore from 0.88 to 0.91 versus the base model while running entirely on a single consumer GPU. All components are modular and publicly released to serve as a conservative baseline for future work. APIDA-Chat is open-sourced at https://github.com/Zeberhart/apida-chat and a video demo is available at https://youtu.be/YqmZBHyGbPs .
Problem

Research questions and friction points this paper is trying to address.

Generating API search dialogues for niche libraries lacking training data
Creating realistic conversations from symbolic scripts using lightweight models
Enabling low-cost dialogue synthesis without exposing proprietary source code
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pipeline converts dialogue scripts into API conversations
Teacher-student model fine-tunes smaller model for efficiency
Modular open-source system enables low-cost dialogue synthesis
🔎 Similar Papers
No similar papers found.