Rerouting LLM Routers

📅 2025-01-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper exposes a critical control-plane vulnerability in large language model (LLM) routers under adversarial perturbations: attackers can reliably induce misrouting across diverse models and configurations using query-agnostic “confusion gadgets,” without degrading response quality. We introduce the novel concept of *LLM control-plane integrity*, formalizing the threat via white-box and black-box adversarial attack modeling, perplexity analysis, and robustness evaluation. Empirical validation across multiple open-source and commercial LLM routers demonstrates high attack success rates (>90%), while conventional perplexity-based filtering proves entirely ineffective. To address this, we propose and rigorously evaluate three novel defense mechanisms—input sanitization, routing confidence calibration, and ensemble-based consensus—that preserve routing accuracy while significantly enhancing robustness. Our work establishes both theoretical foundations and practical solutions for secure, reliable AI system orchestration.

Technology Category

Application Category

📝 Abstract
LLM routers aim to balance quality and cost of generation by classifying queries and routing them to a cheaper or more expensive LLM depending on their complexity. Routers represent one type of what we call LLM control planes: systems that orchestrate use of one or more LLMs. In this paper, we investigate routers' adversarial robustness. We first define LLM control plane integrity, i.e., robustness of LLM orchestration to adversarial inputs, as a distinct problem in AI safety. Next, we demonstrate that an adversary can generate query-independent token sequences we call ``confounder gadgets'' that, when added to any query, cause LLM routers to send the query to a strong LLM. Our quantitative evaluation shows that this attack is successful both in white-box and black-box settings against a variety of open-source and commercial routers, and that confounding queries do not affect the quality of LLM responses. Finally, we demonstrate that gadgets can be effective while maintaining low perplexity, thus perplexity-based filtering is not an effective defense. We finish by investigating alternative defenses.
Problem

Research questions and friction points this paper is trying to address.

LLM Router Vulnerability
Malicious Interference
AI System Security
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM Router Vulnerability
Malicious Attack Induced Model Selection
Defense Strategies for AI Cost Efficiency
🔎 Similar Papers
No similar papers found.