AsyncVoice Agent: Real-Time Explanation for LLM Planning and Reasoning

📅 2025-10-17

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Users struggle to comprehend and intervene in large language models’ (LLMs) internal reasoning processes in real time during complex reasoning tasks. Method: This paper proposes an asynchronous decoupled architecture that enables real-time vocalization of the LLM’s “thought stream” alongside immediate user voice-based interruption, questioning, and guidance. It integrates a streaming LLM backend with a speech frontend via asynchronous communication, low-latency bidirectional synchronization, and real-time TTS/ASR. Contribution/Results: The system establishes a novel, speech-first, real-time intervenable reasoning paradigm—breaking from the conventional input-output model—and achieves over 600× reduction in average interaction latency while preserving state-of-the-art reasoning fidelity and task accuracy. This work pioneers explainable, controllable human-AI collaborative reasoning through seamless, multimodal, low-latency voice interaction.

Technology Category

Application Category

📝 Abstract

Effective human-AI collaboration on complex reasoning tasks requires that users understand and interact with the model's process, not just receive an output. However, the monolithic text from methods like Chain-of-Thought (CoT) prevents this, as current interfaces lack real-time verbalization and robust user barge-in. We present AsyncVoice Agent, a system whose asynchronous architecture decouples a streaming LLM backend from a conversational voice frontend. This design allows narration and inference to run in parallel, empowering users to interrupt, query, and steer the model's reasoning process at any time. Objective benchmarks show this approach reduces interaction latency by more than 600x compared to monolithic baselines while ensuring high fidelity and competitive task accuracy. By enabling a two-way dialogue with a model's thought process, AsyncVoice Agent offers a new paradigm for building more effective, steerable, and trustworthy human-AI systems for high-stakes tasks.

Problem

Research questions and friction points this paper is trying to address.

Real-time verbalization of LLM reasoning process

Enabling user interruption during AI model execution

Reducing interaction latency in human-AI collaboration systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Asynchronous architecture decouples streaming LLM backend

Parallel narration and inference enables real-time interruption

Voice interface allows interactive steering of reasoning process

🔎 Similar Papers

No similar papers found.