Cross-Modal Memory Compression for Efficient Multi-Agent Debate

📅 2026-01-31

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This work addresses the challenge of context inflation in multi-agent debate systems, where increasing rounds and agent counts often exceed model context limits and incur high computational costs. To mitigate this, the authors propose DebateOCR, a novel framework that introduces a cross-modal compression mechanism to encode lengthy textual debate histories into compact visual representations. These image-based summaries are then efficiently reused in subsequent rounds via a dedicated vision encoder. Leveraging the diversity of multi-agent perspectives, the approach theoretically approximates the information bottleneck, enabling high-fidelity recovery of essential debate content. Experiments demonstrate that DebateOCR reduces input tokens by over 92% across multiple benchmarks, significantly lowering computational overhead and accelerating inference while preserving high reasoning quality and minimizing hallucination rates.

Technology Category

Application Category

📝 Abstract

Multi-agent debate can improve reasoning quality and reduce hallucinations, but it incurs rapidly growing context as debate rounds and agent count increase. Retaining full textual histories leads to token usage that can exceed context limits and often requires repeated summarization, adding overhead and compounding information loss. We introduce DebateOCR, a cross-modal compression framework that replaces long textual debate traces with compact image representations, which are then consumed through a dedicated vision encoder to condition subsequent rounds. This design compresses histories that commonly span tens to hundreds of thousands of tokens, cutting input tokens by more than 92% and yielding substantially lower compute cost and faster inference across multiple benchmarks. We further provide a theoretical perspective showing that diversity across agents supports recovery of omitted information: although any single compressed history may discard details, aggregating multiple agents'compressed views allows the collective representation to approach the information bottleneck with exponentially high probability.

Problem

Research questions and friction points this paper is trying to address.

multi-agent debate

context compression

cross-modal representation

token efficiency

information loss

Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-modal compression

multi-agent debate

vision encoder