Can Small GenAI Language Models Rival Large Language Models in Understanding Application Behavior?

📅 2025-11-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether compact generative AI (GenAI) language models can match large language models (LLMs) in application behavior understanding—particularly malware detection. We systematically evaluate GenAI models of varying scales on code understanding, behavioral analysis, and classification tasks, using accuracy, precision, recall, and F1-score as metrics. Results show that while compact GenAI models achieve marginally lower overall accuracy than LLMs, they attain competitive performance on critical metrics—especially recall and F1-score—and exhibit substantially faster inference, lower memory footprint, and reduced deployment cost. The primary contribution is empirical evidence that compact GenAI models deliver high discriminative capability and computational efficiency under resource constraints, offering a lightweight, reliable, and deployable alternative for application behavior analysis. Rather than replacing LLMs, these models serve as complementary tools, extending practical applicability to edge and latency-sensitive environments.

Technology Category

Application Category

📝 Abstract
Generative AI (GenAI) models, particularly large language models (LLMs), have transformed multiple domains, including natural language processing, software analysis, and code understanding. Their ability to analyze and generate code has enabled applications such as source code summarization, behavior analysis, and malware detection. In this study, we systematically evaluate the capabilities of both small and large GenAI language models in understanding application behavior, with a particular focus on malware detection as a representative task. While larger models generally achieve higher overall accuracy, our experiments show that small GenAI models maintain competitive precision and recall, offering substantial advantages in computational efficiency, faster inference, and deployment in resource-constrained environments. We provide a detailed comparison across metrics such as accuracy, precision, recall, and F1-score, highlighting each model's strengths, limitations, and operational feasibility. Our findings demonstrate that small GenAI models can effectively complement large ones, providing a practical balance between performance and resource efficiency in real-world application behavior analysis.
Problem

Research questions and friction points this paper is trying to address.

Evaluating small vs large GenAI models for application behavior understanding
Assessing model performance in malware detection as representative task
Comparing accuracy, precision, recall across models for operational feasibility
Innovation

Methods, ideas, or system contributions that make the work stand out.

Small models maintain competitive precision and recall
Small models offer computational efficiency and faster inference
Small models enable deployment in resource-constrained environments
🔎 Similar Papers
No similar papers found.