Redundant or Necessary? A Benchmark for Detecting Redundant Steps in Agent Trajectories

📅 2026-05-28

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

Current evaluations of large language model (LLM) agents predominantly focus on task success or failure, often overlooking redundant steps in the execution process, which leads to inefficiency and resource waste. This work formally defines the problem of "redundant step detection" for the first time, establishing it as a novel research direction, and introduces RedundancyBench—the first dedicated benchmark comprising diverse tasks and agent trajectories with fine-grained annotations. Systematic evaluation using LLM-based trajectory analysis, human annotations, and three representative methods reveals that even the best-performing approach achieves only 24.88% accuracy, with some performing below random chance. These findings underscore the significant challenge of detecting redundancy and lay the groundwork for systematic assessment of agent execution efficiency.

📝 Abstract

LLM-based agents have demonstrated strong capabilities in solving complex tasks through multi-step reasoning and tool use. However, existing evaluation protocols primarily focus on task success, overlooking a critical aspect of agent behavior: execution efficiency. In practice, agent trajectories often contain redundant steps that consume substantial resources while contributing little to task completion. In this work, we propose and formulate a new research area: \textbf{redundant step detection} for agent trajectories. To support this initiative, we introduce \textbf{RedundancyBench}, a new benchmark that contains diverse tasks with carefully annotated trajectories, where each step is labeled according to its contribution to task completion. Using RedundancyBench, we develop and evaluate 3 representative methods to answer whether a step within trajectory is redundant or necessary. Our results show that even the best-performing method achieves only 24.88\% score in detecting redundant steps, while some methods perform worse than random guessing. These results highlight the task's complexity and the need for further research in this area. \footnote{Code and dataset in this paper are both available in \href{https://anonymous.4open.science/r/RedundancyBench}{https://anonymous.4open.science/r/RedundancyBench}.}

Problem

Research questions and friction points this paper is trying to address.

redundant step detection

agent trajectories

execution efficiency

LLM-based agents

RedundancyBench

Innovation

Methods, ideas, or system contributions that make the work stand out.

redundant step detection

agent trajectory

execution efficiency