🤖 AI Summary
Existing large language models (LLMs) rely on coarse-grained, task-level mixture-of-experts paradigms for multidisciplinary heterogeneous reasoning, failing to capture fine-grained cross-disciplinary knowledge dependencies. Method: We propose a subject-level directed acyclic graph (S-DAG), which decomposes questions into granular subject nodes and employs graph neural networks to model inter-subject dependencies. Integrated with LLM expertise scoring, dynamic subject-to-model matching, and multi-agent collaborative reasoning, our approach enables knowledge-driven, precise routing and cooperation. Contribution/Results: Our method overcomes the limitations of conventional task-level partitioning, achieving significant improvements in both accuracy and reasoning efficiency on multidisciplinary benchmarks—including MMLU-Pro, GPQA, and MedMCQA—while demonstrating strong generalization capability and computational scalability.
📝 Abstract
Large Language Models (LLMs) have achieved impressive performance in complex reasoning problems. Their effectiveness highly depends on the specific nature of the task, especially the required domain knowledge. Existing approaches, such as mixture-of-experts, typically operate at the task level; they are too coarse to effectively solve the heterogeneous problems involving multiple subjects. This work proposes a novel framework that performs fine-grained analysis at subject level equipped with a designated multi-agent collaboration strategy for addressing heterogeneous problem reasoning. Specifically, given an input query, we first employ a Graph Neural Network to identify the relevant subjects and infer their interdependencies to generate an extit{Subject-based Directed Acyclic Graph} (S-DAG), where nodes represent subjects and edges encode information flow. Then we profile the LLM models by assigning each model a subject-specific expertise score, and select the top-performing one for matching corresponding subject of the S-DAG. Such subject-model matching enables graph-structured multi-agent collaboration where information flows from the starting model to the ending model over S-DAG. We curate and release multi-subject subsets of standard benchmarks (MMLU-Pro, GPQA, MedMCQA) to better reflect complex, real-world reasoning tasks. Extensive experiments show that our approach significantly outperforms existing task-level model selection and multi-agent collaboration baselines in accuracy and efficiency. These results highlight the effectiveness of subject-aware reasoning and structured collaboration in addressing complex and multi-subject problems.