Do Agents Know What They Can't Do? Evaluating Feasibility Awareness in Tool-Using Agents

📅 2026-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing tool-augmented agents struggle to recognize infeasible tasks, often leading to futile reasoning and wasted computational resources. To address this limitation, this work proposes FeasiGen, a novel approach that leverages multi-agent collaboration to extract critical tools from successful trajectories, automatically generates infeasible task instances via masking, and constructs a high-quality dataset validated by human annotators with over 94% accuracy. The study further establishes the first feasibility-aware evaluation framework, revealing that current models erroneously persist in executing 73.9% of infeasible tasks. In contrast, the proposed multi-agent architecture substantially reduces this error rate, demonstrating its superior capability in discerning task feasibility.
📝 Abstract
Tool-using agents often incur substantial computational cost due to long reasoning chains and iterative tool usage. In practical scenarios, many tasks become infeasible under constrained tool environments, where the capabilities required for successful task completion are unavailable. Detecting infeasible tasks and stopping execution early can significantly reduce unnecessary execution cost. In this work, we propose FeasiGen, an automatic pipeline for constructing infeasible agent tasks by identifying the critical tools required for successful task completion. Our approach extracts tool-calling traces from successful executions across multiple agent systems, identifies critical tools consistently shared across diverse execution strategies, and masks these tools to automatically transform solvable tasks into infeasible ones. Human verification confirms that the infeasibility annotations for our constructed tasks achieve over 94% accuracy. We further introduce feasibility-aware evaluation metrics for measuring whether agents can recognize infeasible tasks and stop execution appropriately. Extensive evaluations across nine models reveal substantially weak infeasibility detection ability, with false continue rate reaching up to 73.9%. We further observe that multi-agent architectures significantly reduce erroneous execution under infeasible conditions.
Problem

Research questions and friction points this paper is trying to address.

feasibility awareness
tool-using agents
infeasible tasks
execution cost
task completion
Innovation

Methods, ideas, or system contributions that make the work stand out.

feasibility awareness
tool-using agents
infeasible task generation
FeasiGen
execution cost reduction
🔎 Similar Papers