No More, No Less: Task Alignment in Terminal Agents

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work addresses the challenge of task misalignment in terminal-based agents operating in complex environments, where agents often fail to distinguish essential instructions from irrelevant distractions, leading to actions that deviate from user intent. To tackle this issue, we introduce the concept of “task alignment,” which aims to enable agents to selectively attend to critical cues while ignoring extraneous information when executing underspecified tasks. We present TAB, the first benchmark for task alignment, comprising 89 terminal tasks embedded with natural environmental artifacts such as README files and code comments. Through systematic evaluation of state-of-the-art agents and six prompt-injection defense strategies, our experiments reveal that while current top-performing agents achieve high task-completion rates, they exhibit poor alignment with user intent; furthermore, existing defenses often suppress necessary cues alongside harmful interference, impairing overall effectiveness.

📝 Abstract

Terminal agents are increasingly capable of executing complex, long-horizon tasks autonomously from a single user prompt. To do so, they must interpret instructions encountered in the environment (e.g., README files, code comments, stack traces) and determine their relevance to the task. This creates a fundamental challenge: relevant cues must be followed to complete a task, whereas irrelevant or misleading ones must be ignored. Existing benchmarks do not capture this ability. An agent may appear capable by blindly following all instructions, or appear robust by ignoring them altogether. We introduce TAB (Task Alignment Benchmark), a suite of 89 terminal tasks derived from Terminal-Bench 2.1. Each task is intentionally underspecified, with missing information provided as a necessary cue embedded in a natural environmental artifact, alongside a plausible but irrelevant distractor. Solving these tasks requires selectively using the cue while ignoring the distractor. Applying TAB to ten frontier agents reveals a systematic gap between task capability and task alignment. The strongest Terminal-Bench agent achieves high task completion but low task alignment on TAB. Evaluating six prompt-injection defenses further shows that suppressing distractor execution also suppresses the cues required for task completion. These results demonstrate that task-aligned agents require selective use of environmental instructions rather than blanket acceptance or rejection.

Problem

Research questions and friction points this paper is trying to address.

task alignment

terminal agents

environmental instructions

distractor rejection

instruction following

Innovation

Methods, ideas, or system contributions that make the work stand out.

task alignment

terminal agents

instruction following