Harnessing Chain-of-Thought Metadata for Task Routing and Adversarial Prompt Detection

📅 2025-03-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the inefficiency of prompt routing and the difficulty of adversarial prompt detection in LLM production environments—stemming from the absence of pre-prompt task difficulty estimation—this paper introduces “Number of Thoughts” (NofT), the first pre-prompt metric that quantifies task difficulty based on chain-of-thought (CoT) reasoning trajectories. NofT unifies CoT metadata into computable difficulty features, enabling simultaneous support for intelligent prompt routing and adversarial prompt detection. Methodologically, it employs a lightweight classifier coupled with multi-scale quantized distillation models (DeepSeek-1.7B/7B/14B) for efficient deployment. Evaluated on MathInstruct, NofT achieves a 2% end-to-end latency reduction and 95% accuracy in adversarial prompt detection, demonstrating both performance optimization and enhanced robustness.

Technology Category

Application Category

📝 Abstract
In this work, we propose a metric called Number of Thoughts (NofT) to determine the difficulty of tasks pre-prompting and support Large Language Models (LLMs) in production contexts. By setting thresholds based on the number of thoughts, this metric can discern the difficulty of prompts and support more effective prompt routing. A 2% decrease in latency is achieved when routing prompts from the MathInstruct dataset through quantized, distilled versions of Deepseek with 1.7 billion, 7 billion, and 14 billion parameters. Moreover, this metric can be used to detect adversarial prompts used in prompt injection attacks with high efficacy. The Number of Thoughts can inform a classifier that achieves 95% accuracy in adversarial prompt detection. Our experiments ad datasets used are available on our GitHub page: https://github.com/rymarinelli/Number_Of_Thoughts/tree/main.
Problem

Research questions and friction points this paper is trying to address.

Measure task difficulty using Number of Thoughts (NofT) metric
Improve prompt routing efficiency for Large Language Models
Detect adversarial prompts with high accuracy (95%)
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Number of Thoughts (NofT) metric
Routes prompts based on difficulty thresholds
Detects adversarial prompts with 95% accuracy
🔎 Similar Papers
No similar papers found.
R
Ryan Marinelli
University of Oslo, Oslo, Norway
J
Josef Pichlmeier
BMW Group, Ludwig-Maximilians-Universität, Munich, Germany
Tamas Bisztray
Tamas Bisztray
Postdoctoral Researcher at University of Oslo
CybersecurityAI SafetyPrivacy and Data ProtectionIdentity ManagementBiometrics