Concurrency without Model Changes: Future-based Asynchronous Function Calling for LLMs

📅 2026-05-14

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

This work addresses the end-to-end latency in large language models (LLMs) caused by decoding stalls during synchronous function calls. The authors propose AsyncFC, a framework that enables concurrency between model decoding and function execution—and parallelism across functions—without modifying the model architecture or function implementations. Its key innovation lies in uncovering, for the first time, that LLMs natively possess the capability to reason symbolically about “futures,” thereby supporting non-intrusive asynchronous tool invocation. Built upon this insight, AsyncFC employs a dependency-aware asynchronous scheduler based on future semantics while remaining compatible with existing synchronous protocols. Experimental results demonstrate that AsyncFC substantially reduces task completion time on standard function-calling and software engineering benchmarks, all while preserving output accuracy.

📝 Abstract

Function calling, also known as tool use, is a core capability of modern LLM agents but is typically constrained by synchronous execution semantics. Under these semantics, LLM decoding is blocked until each function call completes, resulting in increasing end-to-end latency. In this work, we introduce AsyncFC, a pure execution-layer framework that decouples LLM decoding from function execution, enabling overlap between model decoding and function execution as well as inter-function parallelism when dependencies permit. AsyncFC layers over existing models and unmodified function implementations, requiring no fine-tuning or changes to the standard synchronous function-calling protocol. Across standard function-calling benchmarks and adapted software engineering benchmarks, AsyncFC significantly reduces end-to-end task completion time while preserving task accuracy. Furthermore, these results reveal that LLMs possess a native capability to reason over symbolic futures that represent unresolved execution results, enabling an asynchronous paradigm for model-tool interaction.

Problem

Research questions and friction points this paper is trying to address.

function calling

asynchronous execution

LLM agents

end-to-end latency

tool use

Innovation

Methods, ideas, or system contributions that make the work stand out.

asynchronous function calling

concurrency

symbolic futures