Putting It All into Context: Simplifying Agents with LCLMs

📅 2025-05-12

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This work challenges the necessity of architectural complexity in language model agents, investigating whether long-context language models (LCLMs) alone can suffice for high-difficulty software engineering tasks—such as those in SWE-bench—without multi-step retrieval, multi-agent coordination, or custom scaffolding. We propose a “zero-tool, zero-scaffolding” paradigm: directly concatenating the full environment context into the model’s input, augmented by task-specific prompt engineering and dual-model collaboration (Gemini-1.5-Pro/Claude-3.7). Our experiments critically question the prevailing “more complex is better” design heuristic. Results show Gemini-1.5-Pro achieves 38.0% SWE-Bench-Verified pass rate—outperforming state-of-the-art complex baselines by 6 percentage points; Gemini-2.5-Pro reaches 50.8%, and the dual-model variant attains 48.6%. These findings demonstrate that minimalist, context-centric architectures can deliver strong, competitive performance on real-world, complex software engineering tasks.

Technology Category

Application Category

📝 Abstract

Recent advances in language model (LM) agents have demonstrated significant potential for automating complex real-world tasks. To make progress on these difficult tasks, LM agent architectures have become increasingly complex, often incorporating multi-step retrieval tools, multiple agents, and scaffolding adapted to the underlying LM. In this work, we investigate whether all of this complexity is necessary, or if parts of these scaffolds can be removed on challenging tasks like SWE-bench. We show that in the case of SWE-bench, simply putting the entire environment into the context of a long context language model (LCLM) and properly prompting the model makes it competitive with carefully tuned, complex agent scaffolds. We show that a Gemini-1.5-Pro model without any scaffolding or tools achieves 38% on SWE-Bench-Verified, comparable with approaches using carefully tuned agent scaffolds (32%). While the unscaffolded approach with Gemini-1.5-Pro falls short of the strongest agentic architectures, we demonstrate that the more capable Gemini-2.5-Pro using the same unscaffolded approach directly attains a 50.8% solve rate. Additionally, a two-stage approach combining Gemini-1.5-Pro with Claude-3.7 achieves a competitive 48.6% solve rate.

Problem

Research questions and friction points this paper is trying to address.

Investigates if complex LM agent scaffolds are necessary for tasks like SWE-bench

Tests if simple long-context LMs with proper prompting can match complex agents

Compares performance of scaffolded vs unscaffolded approaches on SWE-bench

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses long context language models (LCLMs)

Eliminates complex agent scaffolds

Relies on proper prompting techniques

🔎 Similar Papers

No similar papers found.