🤖 AI Summary
This work proposes a novel approach to enhance reasoning performance without fine-tuning or updating the parameters of large language models (LLMs). The method constructs a feedforward graph composed of multiple heterogeneous frozen LLMs, where signals are propagated through trainable lightweight linear projections within a shared continuous latent space, enabling end-to-end joint optimization. This is the first demonstration of end-to-end training over a graph of multiple frozen models, confirming the feasibility of gradient flow across model boundaries and revealing that output nodes spontaneously develop selective routing behaviors. With only 17.6 million trainable parameters, the approach achieves state-of-the-art results of 87.3% on ARC-Challenge, 82.8% on OpenBookQA, and 67.2% on MMLU, significantly outperforming both individual models and parameter-matched baseline classifiers.
📝 Abstract
We present a feedforward graph architecture in which heterogeneous frozen large language models serve as computational nodes, communicating through a shared continuous latent space via learned linear projections. Building on recent work demonstrating geometric compatibility between independently trained LLM latent spaces~\cite{armstrong2026thinking}, we extend this finding from static two-model steering to end-to-end trainable multi-node graphs, where projection matrices are optimized jointly via backpropagation through residual stream injection hooks. Three small frozen models (Llama-3.2-1B, Qwen2.5-1.5B, Gemma-2-2B) encode the input into a shared latent space whose aggregate signal is injected into two larger frozen models (Phi-3-mini, Mistral-7B), whose representations feed a lightweight cross-attention output node. With only 17.6M trainable parameters against approximately 12B frozen, the architecture achieves 87.3\% on ARC-Challenge, 82.8\% on OpenBookQA, and 67.2\% on MMLU, outperforming the best single constituent model by 11.4, 6.2, and 1.2 percentage points respectively, and outperforming parameter-matched learned classifiers on frozen single models by 9.1, 5.2, and 6.7 points. Gradient flow through multiple frozen model boundaries is empirically verified to be tractable, and the output node develops selective routing behavior across layer-2 nodes without explicit supervision.