COLT: Lightweight Multi-LLM Collaboration through Shared MCTS Reasoning for Model Compilation

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes COLT, a lightweight multi-LLM collaborative framework designed to reduce the high inference cost of large language models (LLMs) in compiler optimization while mitigating the reliability limitations of smaller models. COLT leverages a shared Monte Carlo Tree Search (MCTS) as a coordination backbone to enable cross-model inference reuse and value propagation. It incorporates model-aware tree policies and a curriculum-based adjustment mechanism to jointly select actions—comprising both optimization transformations and the next participating model. Compared to single-model approaches, COLT achieves comparable or superior performance with significantly lower computational overhead, all while avoiding the reliance on complex external components typical of conventional multi-agent systems.

Technology Category

Application Category

📝 Abstract
Model serving costs dominate AI systems, making compiler optimization essential for scalable deployment. Recent works show that a large language model (LLM) can guide compiler search by reasoning over program structure and optimization history. However, using a single large model throughout the search is expensive, while smaller models are less reliable when used alone. Thus, this paper seeks to answer whether multi-LLM collaborative reasoning relying primarily on small LLMs can match or exceed the performance of a single large model. As such, we propose a lightweight collaborative multi-LLM framework, dubbed COLT, for compiler optimization that enables coordinated reasoning across multiple models within a single Monte Carlo tree search (MCTS) process. A key contribution is the use of a single shared MCTS tree as the collaboration substrate across LLMs, enabling the reuse of transformation prefixes and cross-model value propagation. Hence, we circumvent both heavy internal reasoning mechanisms and conventional agentic machinery that relies on external planners, multiple concurrent LLMs, databases, external memory/versioning of intermediate results, and controllers by simply endogenizing model selection within the lightweight MCTS optimization loop. Every iteration, the acting LLM proposes a joint action: (compiler transformation, model to be queried next). We also introduce a model-aware tree policy that biases search toward smaller models while preserving exploration, and a course-alteration mechanism that escalates to the largest model when the search exhibits persistent regressions attributable to smaller models.
Problem

Research questions and friction points this paper is trying to address.

multi-LLM collaboration
compiler optimization
model serving cost
lightweight reasoning
shared MCTS
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-LLM collaboration
shared MCTS
compiler optimization
lightweight reasoning
model-aware tree policy
🔎 Similar Papers
No similar papers found.