CodeAgents: A Token-Efficient Framework for Codified Multi-Agent Reasoning in LLMs

📅 2025-07-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM-agent prompting frameworks primarily target single-agent planning and emphasize task accuracy, neglecting token efficiency, modularity, and scalability in multi-agent settings. Method: We propose CodeAgents, the first framework to integrate control structures (e.g., conditionals, loops, Boolean logic) and typed variables into multi-agent prompt engineering. It explicitly models task workflows and role decomposition via structured pseudocode, enabling modular collaboration, dynamic feedback, and tool invocation. Contribution/Results: This design significantly enhances interpretability, formal verifiability, and system scalability. On GAIA, HotpotQA, and VirtualHome benchmarks, CodeAgents improves task success rates by 3–36 percentage points (reaching 56% on VirtualHome), while reducing input and output tokens by 55–87% and 41–70%, respectively.

Technology Category

Application Category

📝 Abstract
Effective prompt design is essential for improving the planning capabilities of large language model (LLM)-driven agents. However, existing structured prompting strategies are typically limited to single-agent, plan-only settings, and often evaluate performance solely based on task accuracy - overlooking critical factors such as token efficiency, modularity, and scalability in multi-agent environments. To address these limitations, we introduce CodeAgents, a prompting framework that codifies multi-agent reasoning and enables structured, token-efficient planning in multi-agent systems. In CodeAgents, all components of agent interaction - Task, Plan, Feedback, system roles, and external tool invocations - are codified into modular pseudocode enriched with control structures (e.g., loops, conditionals), boolean logic, and typed variables. This design transforms loosely connected agent plans into cohesive, interpretable, and verifiable multi-agent reasoning programs. We evaluate the proposed framework across three diverse benchmarks - GAIA, HotpotQA, and VirtualHome - using a range of representative LLMs. Results show consistent improvements in planning performance, with absolute gains of 3-36 percentage points over natural language prompting baselines. On VirtualHome, our method achieves a new state-of-the-art success rate of 56%. In addition, our approach reduces input and output token usage by 55-87% and 41-70%, respectively, underscoring the importance of token-aware evaluation metrics in the development of scalable multi-agent LLM systems. The code and resources are available at: https://anonymous.4open.science/r/CodifyingAgent-5A86
Problem

Research questions and friction points this paper is trying to address.

Enhances multi-agent planning with token-efficient structured prompting
Addresses limitations in modularity and scalability of current LLM agents
Improves interpretability and verifiability of multi-agent reasoning programs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Codifies multi-agent reasoning into modular pseudocode
Uses control structures and typed variables
Reduces token usage significantly
🔎 Similar Papers
No similar papers found.