HyFunc: Accelerating LLM-based Function Calls for Agentic AI through Hybrid-Model Cascade and Dynamic Templating

πŸ“… 2026-02-14
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the high computational redundancy and inference latency in large language models when generating structured function calls for intelligent agents, which hinders real-time applicability. To mitigate this, the authors propose an efficient, low-redundancy framework that integrates hybrid model cascading, soft token–guided lightweight retrieval, prefix-tuning of smaller models, and dynamic template injection. The approach is further supported by an extension to the vLLM inference engine to enable dynamic structured output generation. By eliminating redundant processing of function descriptions, full-sequence generation, and fixed syntactic overhead, the method achieves 80.1% accuracy on the BFCL benchmark with a reduced inference latency of 0.828 seconds, significantly outperforming comparable models in both efficiency and effectiveness.

Technology Category

Application Category

πŸ“ Abstract
While agentic AI systems rely on LLMs to translate user intent into structured function calls, this process is fraught with computational redundancy, leading to high inference latency that hinders real-time applications. This paper identifies and addresses three key redundancies: (1) the redundant processing of a large library of function descriptions for every request; (2) the redundant use of a large, slow model to generate an entire, often predictable, token sequence; and (3) the redundant generation of fixed, boilerplate parameter syntax. We introduce HyFunc, a novel framework that systematically eliminates these inefficiencies. HyFunc employs a hybrid-model cascade where a large model distills user intent into a single"soft token."This token guides a lightweight retriever to select relevant functions and directs a smaller, prefix-tuned model to generate the final call, thus avoiding redundant context processing and full-sequence generation by the large model. To eliminate syntactic redundancy, our"dynamic templating"technique injects boilerplate parameter syntax on-the-fly within an extended vLLM engine. To avoid potential limitations in generalization, we evaluate HyFunc on an unseen benchmark dataset, BFCL. Experimental results demonstrate that HyFunc achieves an excellent balance between efficiency and performance. It achieves an inference latency of 0.828 seconds, outperforming all baseline models, and reaches a performance of 80.1%, surpassing all models with a comparable parameter scale. These results suggest that HyFunc offers a more efficient paradigm for agentic AI. Our code is publicly available at https://github.com/MrBlankness/HyFunc.
Problem

Research questions and friction points this paper is trying to address.

LLM-based function calls
computational redundancy
inference latency
agentic AI
real-time applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid-Model Cascade
Dynamic Templating
Function Call Acceleration
Agentic AI
Inference Latency Reduction
πŸ”Ž Similar Papers
No similar papers found.