🤖 AI Summary
This study addresses the challenge of deploying large language models in resource-constrained environments due to their high computational costs. To this end, the authors systematically evaluate the performance and efficiency of 16 language models, ranging from 0.5B to 3B parameters, across five categories of NLP tasks. They introduce a novel task-specific efficiency analysis framework and propose a Performance-Efficiency Ratio (PER) metric, which integrates accuracy, throughput, memory footprint, and latency through geometric mean normalization. Experimental results demonstrate that smaller models consistently achieve superior PER scores across all evaluated tasks, offering both quantitative justification and practical guidance for efficient inference deployment in real-world scenarios.
📝 Abstract
Large Language Models achieve remarkable performance but incur substantial computational costs unsuitable for resource-constrained deployments. This paper presents the first comprehensive task-specific efficiency analysis comparing 16 language models across five diverse NLP tasks. We introduce the Performance-Efficiency Ratio (PER), a novel metric integrating accuracy, throughput, memory, and latency through geometric mean normalization. Our systematic evaluation reveals that small models (0.5--3B parameters) achieve superior PER scores across all given tasks. These findings establish quantitative foundations for deploying small models in production environments prioritizing inference efficiency over marginal accuracy gains.