🤖 AI Summary
Current large language models (LLMs) exhibit limited performance on open-domain tasks requiring complex domain-specific computation or dynamic tool invocation, and lack standardized benchmarks for evaluation. This paper introduces GitHub-Agent, an LLM-based autonomous agent capable of discovering, understanding, adapting, and integrating tools directly from GitHub repositories via an end-to-end closed-loop pipeline: tool discovery (via issue/PR retrieval), comprehension (repository-level semantic modeling), adaptation (automatic API generation), and integration (hybrid RAG and fine-tuning). Crucially, it enables LLMs to learn tools from authentic open-source collaboration data—rather than synthetic or manually annotated examples—for the first time. Evaluated on 30 real-world, user-specified complex queries, GitHub-Agent achieves a 69.4% success rate, substantially outperforming fixed-tool baselines. Additionally, we construct the first open-domain benchmark subset specifically designed for evaluating tool-augmented LLMs.
📝 Abstract
While Large Language Models (LLMs) like ChatGPT and GPT-4 have demonstrated exceptional proficiency in natural language processing, their efficacy in addressing complex, multifaceted tasks remains limited. A growing area of research focuses on LLM-based agents equipped with external tools capable of performing diverse tasks. However, existing LLM-based agents only support a limited set of tools which is unable to cover a diverse range of user queries, especially for those involving expertise domains. It remains a challenge for LLM-based agents to extend their tools autonomously when confronted with various user queries. As GitHub has hosted a multitude of repositories which can be seen as a good resource for tools, a promising solution is that LLM-based agents can autonomously integrate the repositories in GitHub according to the user queries to extend their tool set. In this paper, we introduce GitAgent, an agent capable of achieving the autonomous tool extension from GitHub. GitAgent follows a four-phase procedure to incorporate repositories and it can learn human experience by resorting to GitHub Issues/PRs to solve problems encountered during the procedure. Experimental evaluation involving 30 user queries demonstrates GitAgent's effectiveness, achieving a 69.4% success rate on average.