Large Language Models Are Read/Write Policy-Makers for Simultaneous Generation

📅 2025-01-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing large language models (LLMs) struggle to balance output latency and text quality in streaming-input synchronous generation tasks, particularly in dynamically deciding *when* to emit tokens. This work introduces LSG—the first LLM-driven synchronous generation framework—requiring neither fine-tuning nor dynamic programming, and enabling open-source LLMs to jointly learn read-write policies and generate text. LSG formulates a latency-minimization baseline and employs policy-guided inference to adaptively optimize token emission timing under a latency–quality trade-off. Evaluated on synchronous machine translation and streaming speech recognition, LSG substantially outperforms prior methods, achieving state-of-the-art performance. It is the first systematic demonstration that open-source LLMs can effectively and practically support real-world low-latency synchronous generation.

Technology Category

Application Category

📝 Abstract
Simultaneous generation models write generation results while reading streaming inputs, necessitating a policy-maker to determine the appropriate output timing. Existing simultaneous generation methods generally adopt the traditional encoder-decoder architecture and learn the generation and policy-making capabilities through complex dynamic programming techniques. Although LLMs excel at text generation, they face challenges in taking on the role of policy-makers through traditional training methods, limiting their exploration in simultaneous generation. To overcome these limitations, we propose a novel LLM-driven Simultaneous Generation (LSG) framework, which allows the off-the-shelf LLM to decide the generation timing and produce output concurrently. Specifically, LSG selects the generation policy that minimizes latency as the baseline policy. Referring to the baseline policy, LSG enables the LLM to devise an improved generation policy that better balances latency and generation quality, and writes generation results accordingly. Experiments on simultaneous translation and streaming automatic speech recognition tasks show that our method can achieve state-of-the-art performance utilizing the open-source LLMs and demonstrate practicality in real-world scenarios.
Problem

Research questions and friction points this paper is trying to address.

Synchronous Generation
Large Language Models
Output Timing
Innovation

Methods, ideas, or system contributions that make the work stand out.

LSG Framework
Simultaneous Generation
Large Language Models
🔎 Similar Papers
No similar papers found.
S
Shoutao Guo
Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences (ICT/CAS); University of Chinese Academy of Sciences, Beijing, China
Shaolei Zhang
Shaolei Zhang
Institute of Computing Technology, Chinese Academy of Sciences (ICT/CAS)
Natural Language ProcessingLarge Language ModelMultimodal LLMsSimultaneous Translation
Zhengrui Ma
Zhengrui Ma
Institute of Computing Technology, Chinese Academy of Sciences
Language Modeling
Y
Yang Feng
Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences (ICT/CAS); Key Laboratory of AI Safety, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Beijing, China