Many-Tier Instruction Hierarchy in LLM Agents

📅 2026-04-10

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses the challenge that large language model agents struggle to reliably follow highest-priority instructions when confronted with multi-source, heterogeneous commands under complex and dynamic permission hierarchies. Existing approaches support only a limited, fixed number of priority levels and fail to handle real-world conflicts effectively. To bridge this gap, we propose the Many-Level Instruction Hierarchy (ManyIH) paradigm, which for the first time accommodates an arbitrary number of instruction priority levels. We also introduce ManyIH-Bench, the first fine-grained and scalable evaluation benchmark, encompassing 12 priority levels, 853 tasks, and 46 realistic agent scenarios. Experiments reveal that state-of-the-art models achieve only around 40% accuracy under high-order conflicts, underscoring the challenge and necessity of ManyIH-Bench and paving the way toward safer and more effective agent behavior in complex instruction environments.

Technology Category

Application Category

📝 Abstract

Large language model agents receive instructions from many sources-system messages, user prompts, tool outputs, and more-each carrying different levels of trust and authority. When these instructions conflict, models must reliably follow the highest-privilege instruction to remain safe and effective. The dominant paradigm, instruction hierarchy (IH), assumes a fixed, small set of privilege levels (typically fewer than five) defined by rigid role labels (e.g., system>user). This is inadequate for real-world agentic settings, where conflicts can arise across far more sources and contexts. In this work, we propose Many-Tier Instruction Hierarchy (ManyIH), a paradigm for resolving instruction conflicts among instructions with arbitrarily many privilege levels. We introduce ManyIH-Bench, the first benchmark for ManyIH. ManyIH-Bench requires models to navigate up to 12 levels of conflicting instructions with varying privileges, comprising 853 agentic tasks (427 coding and 426 instruction-following). ManyIH-Bench composes constraints developed by LLMs and verified by humans to create realistic and difficult test cases spanning 46 real-world agents. Our experiments show that even the current frontier models perform poorly (~40% accuracy) when instruction conflict scales. This work underscores the urgent need for methods that explicitly target fine-grained, scalable instruction conflict resolution in agentic settings.

Problem

Research questions and friction points this paper is trying to address.

instruction hierarchy

LLM agents

instruction conflict

privilege levels

agentic systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Many-Tier Instruction Hierarchy

instruction conflict resolution

LLM agents