Incorporating Token Usage into Prompting Strategy Evaluation

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

Prior evaluations of large language model (LLM) prompting strategies have predominantly focused on task performance, neglecting computational efficiency—specifically, the trade-off between performance and token consumption. Method: This paper introduces an efficiency-oriented evaluation paradigm, proposing the first Big-Oₜₒₖ theoretical framework based on progressive complexity to characterize the asymptotic token growth order of prompting strategies; it further designs the empirical metric *Token Cost* to quantify diminishing marginal returns in performance as token count increases. Combining theoretical modeling with systematic empirical measurement, the study evaluates mainstream strategies—including zero-shot, few-shot, and chain-of-thought prompting. Results: Experiments reveal substantial token redundancy across most prompting strategies; notably, optimal strategies achieve ≥90% of peak task performance while reducing token consumption by 30–60%. This work establishes a quantifiable, comparable benchmark for efficient prompt engineering.

Technology Category

Application Category

📝 Abstract

In recent years, large language models have demonstrated remarkable performance across diverse tasks. However, their task effectiveness is heavily dependent on the prompting strategy used to elicit output, which can vary widely in both performance and token usage. While task performance is often used to determine prompting strategy success, we argue that efficiency--balancing performance and token usage--can be a more practical metric for real-world utility. To enable this, we propose Big-$O_{tok}$, a theoretical framework for describing the token usage growth of prompting strategies, and analyze Token Cost, an empirical measure of tokens per performance. We apply these to several common prompting strategies and find that increased token usage leads to drastically diminishing performance returns. Our results validate the Big-$O_{tok}$ analyses and reinforce the need for efficiency-aware evaluations.

Problem

Research questions and friction points this paper is trying to address.

Evaluating prompting strategies by balancing performance and token usage

Proposing Big-Otok framework to analyze token growth in prompts

Demonstrating diminishing performance returns with increased token usage

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes Big-Otok for token usage growth analysis

Introduces Token Cost as empirical efficiency metric

Demonstrates diminishing returns from increased tokens

🔎 Similar Papers

No similar papers found.