🤖 AI Summary
Cost-aware routing, designed to balance performance and inference cost, is vulnerable to adversarial attacks that can manipulate the router into consistently selecting expensive large models, thereby introducing both security and economic risks. This work proposes R²A, the first routing attack method effective in black-box settings. R²A constructs a hybrid ensemble of proxy routers to mimic the target routing behavior and employs an ensemble-based adversarial suffix optimization algorithm to generate potent attack suffixes. Notably, the approach requires neither white-box access to the target system nor heuristic prompting strategies. Empirical evaluations demonstrate that R²A significantly increases the proportion of queries misrouted to costly models across multiple open-source and commercial routing systems, confirming its effectiveness and strong generalization capability.
📝 Abstract
Cost-aware routing dynamically dispatches user queries to models of varying capability to balance performance and inference cost. However, the routing strategy introduces a new security concern that adversaries may manipulate the router to consistently select expensive high-capability models. Existing routing attacks depend on either white-box access or heuristic prompts, rendering them ineffective in real-world black-box scenarios. In this work, we propose R$^2$A, which aims to mislead black-box LLM routers to expensive models via adversarial suffix optimization. Specifically, R$^2$A deploys a hybrid ensemble surrogate router to mimic the black-box router. A suffix optimization algorithm is further adapted for the ensemble-based surrogate. Extensive experiments on multiple open-source and commercial routing systems demonstrate that {R$^2$A} significantly increases the routing rate to expensive models on queries of different distributions. Code and examples: https://github.com/thcxiker/R2A-Attack.