🤖 AI Summary
This work addresses the computational redundancy of large monolithic language models in complex reasoning, as well as the limitations of existing approaches—static pipelines prone to error propagation and dynamic architectures suffering from trajectory divergence and memory bloat. To overcome these challenges, the authors propose a lightweight multi-model interaction framework that leverages a shared base model with time-division multiplexed PEFT adapters, augmented by a confidence-driven hierarchical self-repair mechanism featuring fine-grained patching and subgraph reconstruction through dynamic topology reconfiguration. Evaluated on a single consumer-grade GPU, an 8B-parameter instantiation of the framework achieves performance on StrategyQA (87.6%), MATH (82.7%), and FinQA comparable to that of a 72B model, while reducing latency and token consumption by up to 68.1% and 68.6%, respectively, compared to unconstrained dynamic architectures.
📝 Abstract
Tackling complex reasoning tasks typically relies on massive monolithic LLMs, which suffer from severe computational redundancy. While task decomposition through structured pipelines or multi-agent collaborations offers an alternative, these approaches inevitably fall into a critical dilemma: predefined static topologies are highly vulnerable to cascading errors, whereas unconstrained dynamic agents suffer from trajectory divergence and unpredictable memory bloat. To address this, we present DynaGraph, a lightweight multi-model framework driven by dynamic topological reconfiguration. At the execution level, DynaGraph multiplexes time-division PEFT adapters over a shared base model, enabling both full system training and inference deployment on a single consumer-grade GPU. At the routing level, the Evaluator continuously monitors execution confidence to trigger hierarchical self-healing: Fine-grained Patching for localized data gaps and Subgraph Reconstruction for severe logical ruptures. Experiments on StrategyQA, MATH, and FinQA demonstrate our 8B model closely approximates the reasoning capabilities of a 72B monolithic model (e.g., 87.6% on StrategyQA, 82.7% on MATH). Furthermore, it reduces latency by up to 68.1% and token consumption by 68.6% compared to unconstrained dynamic architectures.