🤖 AI Summary
Small language models (SLMs) suffer from “mode mismatch” in tool-augmented systems: due to naming conventions internalized during pretraining, they frequently generate semantically plausible but nonexistent tool names, causing invocation failures. This work proposes a fine-tuning-free paradigm for tool-mode alignment—reconfiguring tool names to align with the SLM’s pretrained knowledge, rather than adapting model parameters. Crucially, we repurpose the *peakedness* signal from contamination detection to quantify name familiarity, enabling selection of candidate names with high output concentration and thereby substantially mitigating hallucination. Evaluated on MetaTool and RoTBench, our approach improves accuracy by up to 17 percentage points and reduces mode-mismatch errors by 80%, achieving performance competitive with large language models while incurring minimal computational overhead.
📝 Abstract
Small language models (SLMs) offer significant computational advantages for tool-augmented AI systems, yet they struggle with tool-use tasks, particularly in selecting appropriate tools and identifying correct parameters. A common failure mode is schema misalignment: models hallucinate plausible but non-existent tool names that reflect naming conventions internalized during pretraining but absent from the provided tool schema. Rather than forcing models to adapt to arbitrary schemas, we propose adapting schemas to align with models'pretrained knowledge. We introduce PA-Tool (Pretraining-Aligned Tool Schema Generation), a training-free method that leverages peakedness-a signal from contamination detection indicating pretraining familiarity-to automatically rename tool components. By generating multiple candidates and selecting those with highest output concentration across samples, PA-Tool identifies pretrain-aligned naming patterns. Experiments on MetaTool and RoTBench show improvements of up to 17% points, with schema misalignment errors reduced by 80%. PA-Tool enables small models to approach state-of-the-art performance while maintaining computational efficiency for adaptation to new tools without retraining. Our work demonstrates that schema-level interventions can unlock the tool-use potential of resource-efficient models by adapting schemas to models rather than models to schemas.