Weak-for-Strong: Training Weak Meta-Agent to Harness Strong Executors

📅 2025-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
High customization costs for large language models (LLMs) and the limitations of existing training-free methods—either requiring manual intervention or suffering from suboptimal performance—pose significant challenges. To address these, we propose a “weak meta-agent drives strong executor” framework: a lightweight 7B model serves as the meta-agent, modeling workflow design as a multi-step Markov decision process; intelligent workflow optimization is achieved via reinforcement-learning-driven agent orchestration (RLAO), enabling fully automated, zero-human-intervention evolution of multi-step reasoning pipelines. Our method supports cross-task zero-shot transfer and outperforms the strongest baselines by 2.9–24.6% on average across 11 benchmarks. It substantially enhances the performance of GPT-3.5-Turbo and GPT-4o while requiring only one GPU-hour of training—demonstrating exceptional efficiency, generalizability, and practical applicability.

Technology Category

Application Category

📝 Abstract
Efficiently leveraging of the capabilities of contemporary large language models (LLMs) is increasingly challenging, particularly when direct fine-tuning is expensive and often impractical. Existing training-free methods, including manually or automated designed workflows, typically demand substantial human effort or yield suboptimal results. This paper proposes Weak-for-Strong Harnessing (W4S), a novel framework that customizes smaller, cost-efficient language models to design and optimize workflows for harnessing stronger models. W4S formulates workflow design as a multi-turn markov decision process and introduces reinforcement learning for agentic workflow optimization (RLAO) to train a weak meta-agent. Through iterative interaction with the environment, the meta-agent learns to design increasingly effective workflows without manual intervention. Empirical results demonstrate the superiority of W4S that our 7B meta-agent, trained with just one GPU hour, outperforms the strongest baseline by 2.9% ~ 24.6% across eleven benchmarks, successfully elevating the performance of state-of-the-art models such as GPT-3.5-Turbo and GPT-4o. Notably, W4S exhibits strong generalization capabilities across both seen and unseen tasks, offering an efficient, high-performing alternative to directly fine-tuning strong models.
Problem

Research questions and friction points this paper is trying to address.

Efficiently leveraging large language models without fine-tuning
Automating workflow design for stronger model performance
Training weak meta-agents to optimize strong executors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Weak-for-Strong Harnessing (W4S) framework
Reinforcement learning for workflow optimization
Cost-efficient meta-agent trains with one GPU hour
🔎 Similar Papers
No similar papers found.