Concurrency without Model Changes: Future-based Asynchronous Function Calling for LLMs

📅 2026-05-14
📈 Citations: 0
Influential: 0
📄 PDF

career value

196K/year
🤖 AI Summary
This work addresses the end-to-end latency in large language models (LLMs) caused by decoding stalls during synchronous function calls. The authors propose AsyncFC, a framework that enables concurrency between model decoding and function execution—and parallelism across functions—without modifying the model architecture or function implementations. Its key innovation lies in uncovering, for the first time, that LLMs natively possess the capability to reason symbolically about “futures,” thereby supporting non-intrusive asynchronous tool invocation. Built upon this insight, AsyncFC employs a dependency-aware asynchronous scheduler based on future semantics while remaining compatible with existing synchronous protocols. Experimental results demonstrate that AsyncFC substantially reduces task completion time on standard function-calling and software engineering benchmarks, all while preserving output accuracy.
📝 Abstract
Function calling, also known as tool use, is a core capability of modern LLM agents but is typically constrained by synchronous execution semantics. Under these semantics, LLM decoding is blocked until each function call completes, resulting in increasing end-to-end latency. In this work, we introduce AsyncFC, a pure execution-layer framework that decouples LLM decoding from function execution, enabling overlap between model decoding and function execution as well as inter-function parallelism when dependencies permit. AsyncFC layers over existing models and unmodified function implementations, requiring no fine-tuning or changes to the standard synchronous function-calling protocol. Across standard function-calling benchmarks and adapted software engineering benchmarks, AsyncFC significantly reduces end-to-end task completion time while preserving task accuracy. Furthermore, these results reveal that LLMs possess a native capability to reason over symbolic futures that represent unresolved execution results, enabling an asynchronous paradigm for model-tool interaction.
Problem

Research questions and friction points this paper is trying to address.

function calling
asynchronous execution
LLM agents
end-to-end latency
tool use
Innovation

Methods, ideas, or system contributions that make the work stand out.

asynchronous function calling
concurrency
symbolic futures
execution-layer framework
LLM tool use