🤖 AI Summary
This study systematically evaluates the effectiveness and limitations of large language models (LLMs) in dataset-based question answering tasks. Focusing on two distinct paradigms—direct question answering and SQL generation—the work compares LLMs against lightweight models across datasets of varying difficulty, employing diverse prompting strategies. The findings reveal a critical trade-off among model scale, resource efficiency, and analytical capability: while LLMs excel in complex reasoning scenarios, smaller models offer lower computational costs but are constrained in their applicability. By delineating the performance boundaries of different model classes in data-driven analytical tasks, this research provides practical guidance for informed model selection and deployment in real-world applications.
📝 Abstract
This paper investigates the effectiveness of large language models (LLMs) in answering questions over datasets. We examine their performance in two scenarios: (a) directly answering questions given a dataset file as input, and (b) generating SQL queries to answer questions given the schema of a relational database. We also evaluate the impact of different prompting strategies on model performance. The study includes both state-of-the-art LLMs and smaller language models that require fewer resources and operate at lower computational and financial cost. Experiments are conducted on two datasets containing questions of varying difficulty. The results demonstrate the strong performance of large LLMs, while highlighting the limitations of smaller, more cost-efficient models. These findings contribute to a better understanding of how LLMs can be utilized in data analytics tasks and their associated limitations.