Green LLM Techniques in Action: How Effective Are Existing Techniques for Improving the Energy Efficiency of LLM-Based Applications in Industry?

📅 2026-01-05

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This study addresses the high energy consumption challenges of deploying large language models (LLMs) in industrial settings, where existing energy-saving techniques lack empirical validation under real-world conditions. In a production-grade chatbot environment, we present the first systematic evaluation of four green LLM approaches—small-large model collaboration (NPCC), prompt optimization, 2-bit quantization, and batching—examining their trade-offs among energy efficiency, accuracy, and response latency. Our findings reveal that only NPCC achieves substantial energy reduction without compromising model performance; in contrast, the other methods, while capable of cutting energy use by up to 90%, incur severe accuracy degradation. This work provides empirical evidence and practical guidance for optimizing the energy efficiency of industrial-scale LLM deployments.

Technology Category

Application Category

📝 Abstract

The rapid adoption of large language models (LLMs) has raised concerns about their substantial energy consumption, especially when deployed at industry scale. While several techniques have been proposed to address this, limited empirical evidence exists regarding the effectiveness of applying them to LLM-based industry applications. To fill this gap, we analyzed a chatbot application in an industrial context at Schuberg Philis, a Dutch IT services company. We then selected four techniques, namely Small and Large Model Collaboration, Prompt Optimization, Quantization, and Batching, applied them to the application in eight variations, and then conducted experiments to study their impact on energy consumption, accuracy, and response time compared to the unoptimized baseline. Our results show that several techniques, such as Prompt Optimization and 2-bit Quantization, managed to reduce energy use significantly, sometimes by up to 90%. However, these techniques especially impacted accuracy negatively, to a degree that is not acceptable in practice. The only technique that achieved significant and strong energy reductions without harming the other qualities substantially was Small and Large Model Collaboration via Nvidia's Prompt Task and Complexity Classifier (NPCC) with prompt complexity thresholds. This highlights that reducing the energy consumption of LLM-based applications is not difficult in practice. However, improving their energy efficiency, i.e., reducing energy use without harming other qualities, remains challenging. Our study provides practical insights to move towards this goal.

Problem

Research questions and friction points this paper is trying to address.

energy efficiency

large language models

industrial applications

sustainability

LLM optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Green LLM

Small and Large Model Collaboration

Prompt Optimization