🤖 AI Summary
This work addresses the limitations of inefficient textual communication in multi-model collaboration, which often leads to information loss and constrained coordination. The authors propose a parameter-efficient method that aligns the key-value (K-V) caches of multiple large language models within a shared latent space without modifying their pre-trained parameters. By introducing lightweight adapters, the approach enables efficient cross-model translation of internal states. This is the first method to achieve latent-space alignment of K-V caches across distinct models, facilitating high-bandwidth communication and direct transfer of capabilities such as soft prompting. Experiments with the Gemma-2 model family demonstrate that the framework not only enables seamless inter-model collaboration and knowledge sharing but also significantly enhances individual model performance on downstream tasks, thereby validating the effectiveness and scalability of latent-space collaborative mechanisms.
📝 Abstract
Solving increasingly complex problems with large language models (LLMs) necessitates a move beyond individual models and towards multi-model systems that can effectively collaborate. While text has traditionally served as the medium for inter-model communication, a richer and more efficient exchange is possible if models can access each other's internal states directly. In this paper, we propose learning a shared representation space that aligns the k-v caches of multiple models, creating a high-bandwidth channel for collaboration without altering the underlying pre-trained parameters. We do so by augmenting each model with adapters to translate its state into and out of this shared space. Via a suite of experiments with Gemma-2 models, we demonstrate that this approach not only enables seamless inter-model communication but also improves individual model performance. We also show that the shared space allows for the direct transfer of learned skills, such as soft prompts, between different models. Our work represents a significant step towards a future where models can fluidly share knowledge and capabilities.