MT-OSC: Path for LLMs that Get Lost in Multi-Turn Conversation

📅 2026-04-09

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

This work addresses the performance degradation, increased latency, and computational overhead that large language models (LLMs) experience in multi-turn dialogues due to excessively long contexts. To mitigate these issues, the authors propose MT-OSC, a novel lightweight dialogue history compression framework that operates without interrupting user interaction. MT-OSC employs a Condenser Agent—comprising a few-shot reasoning–driven Condenser and a lightweight Decider—to automatically and selectively preserve critical information from the conversation history in a single pass. Evaluated across 13 mainstream LLMs and multiple multi-turn benchmarks, the method reduces token usage by up to 72% over 10-turn dialogues while significantly shortening input length. Crucially, it maintains or even enhances model accuracy, effectively narrowing the performance gap observed in extended conversations.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) suffer significant performance degradation when user instructions and context are distributed over multiple conversational turns, yet multi-turn (MT) interactions dominate chat interfaces. The routine approach of appending full chat history to prompts rapidly exhausts context windows, leading to increased latency, higher computational costs, and diminishing returns as conversations extend. We introduce MT-OSC, a One-off Sequential Condensation framework that efficiently and automatically condenses chat history in the background without disrupting the user experience. MT-OSC employs a Condenser Agent that uses a few-shot inference-based Condenser and a lightweight Decider to selectively retain essential information, reducing token counts by up to 72% in 10-turn dialogues. Evaluated across 13 state-of-the-art LLMs and diverse multi-turn benchmarks, MT-OSC consistently narrows the multi-turn performance gap - yielding improved or preserved accuracy across datasets while remaining robust to distractors and irrelevant turns. Our results establish MT-OSC as a scalable solution for multi-turn chats, enabling richer context within constrained input spaces, reducing latency and operational cost, while balancing performance.

Problem

Research questions and friction points this paper is trying to address.

multi-turn conversation

large language models

context window

chat history

performance degradation

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-turn conversation

context condensation

large language models