🤖 AI Summary
Large language models (LLMs) incur high inference energy consumption, hindering their sustainable deployment. This work systematically identifies, for the first time, that prompt semantics and task-relevant keywords—not merely token length—dominate inference energy consumption. We conduct empirical studies across three open-source Transformer architectures (Llama, Falcon, Phi) on question answering, sentiment analysis, and text generation, integrating fine-grained energy profiling, semantic analysis, and task-aware feature modeling. Results demonstrate that response characteristics significantly affect energy use even within the same task; we identify cross-task high- and low-energy keyword sets; and semantic-driven prompt optimization consistently reduces inference energy. Our study establishes a novel, energy-aware prompt design paradigm, providing both theoretical foundations and practical methodologies for developing energy-adaptive LLMs.
📝 Abstract
Large Language Models (LLMs) have become widely used across various domains spanning search engines, code generation, and text creation. However, a major concern associated with their adoption is the high cost of inference, impacting both their sustainability and financial feasibility. In this study, we empirically study how different prompt and response characteristics directly impact LLM inference energy cost. We conduct experiments leveraging three open-source transformer-based LLMs across three task types$-$question answering, sentiment analysis, and text generation. For each inference, we analyzed prompt and response characteristics (length, semantic meaning, time taken, energy consumption). Our results demonstrate that even when presented with identical tasks, models generate responses with varying characteristics and subsequently exhibit distinct energy consumption patterns. We found that prompt length is less significant than the semantic meaning of the task itself. In addition, we identified specific keywords associated with higher or lower energy usage that vary between associated tasks. These findings highlight the importance of prompt design in optimizing inference efficiency. We conclude that the semantic meaning of prompts and certain task-related keywords significantly impact inference costs, leading the way for deeper exploration towards creating energy-adaptive LLMs.