The Surprising Soupability of Documents in State Space Models

📅 2025-05-29

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This work investigates the post-hoc mergeability of hidden states in structured state space models (SSMs). We propose “document souping”: independently encoding long documents in chunks and aggregating their hidden states via lightweight operations (e.g., averaging) to construct a unified contextual representation—without joint encoding or reprocessing. We are the first to discover that Mamba2-style SSMs exhibit *soupability*: their hidden states admit effective, length-agnostic aggregation, decoupling long-context modeling from quadratic sequence-length dependencies and excessive compute. Our method integrates multi-hop question answering and sparse retrieval via joint training. On HotpotQA, 10-document souping achieves performance nearly matching that of cross-encoders, while substantially improving accuracy on multi-hop QA, long-document reasoning, and sparse retrieval—demonstrating strong trade-offs between efficiency and effectiveness.

Technology Category

Application Category

📝 Abstract

We investigate whether hidden states from Structured State Space Models (SSMs) can be merged post-hoc to support downstream reasoning. Inspired by model souping, we propose a strategy where documents are encoded independently and their representations are pooled -- via simple operations like averaging -- into a single context state. This approach, which we call document souping, enables modular encoding and reuse without reprocessing the full input for each query. We finetune Mamba2 models to produce soupable representations and find that they support multi-hop QA, sparse retrieval, and long-document reasoning with strong accuracy. On HotpotQA, souping ten independently encoded documents nearly matches the performance of a cross-encoder trained on the same inputs.

Problem

Research questions and friction points this paper is trying to address.

Merge hidden states from SSMs for downstream reasoning

Enable modular document encoding without reprocessing

Improve multi-hop QA and long-document reasoning accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Merge hidden states post-hoc for reasoning

Pool document representations via averaging

Finetune Mamba2 for soupable representations

🔎 Similar Papers

DocMamba: Efficient Document Pre-training with State Space Model