🤖 AI Summary
This work addresses the high false-positive rates of existing rule-based static analysis tools for detecting security smells in Infrastructure-as-Code (IaC), which can lead to significant risks at scale. To balance precision and efficiency, the authors propose a knowledge distillation–driven hybrid analysis framework that first applies symbolic rules to broadly identify candidate security smells and then employs a lightweight student model—trained via distillation from a large language model and refined with pseudo-labeling—to efficiently filter out false positives. Evaluated on real-world IaC datasets, the approach achieves an F1 score of 83%, substantially outperforming baseline methods by 7–42%. Notably, it enables the detection of 60% of security smells by reviewing less than 2% of the code, offering high cost-effectiveness, low latency, offline deployability, and strong data privacy guarantees.
📝 Abstract
Infrastructure as Code (IaC) enables automated provisioning of large-scale cloud and on-premise environments, reducing the need for repetitive manual setup. However, this automation is a double-edged sword: a single misconfiguration in IaC scripts can propagate widely, leading to severe system downtime and security risks. Prior studies have shown that IaC scripts often contain security smells--bad coding patterns that may introduce vulnerabilities--and have proposed static analyzers based on symbolic rules to detect them. Yet, our preliminary analysis reveals that rule-based detection alone tends to over-approximate, producing excessive false positives and increasing the burden of manual inspection. In this paper, we present IntelliSA, an intelligent static analyzer for IaC security smell detection that integrates symbolic rules with neural inference. IntelliSA applies symbolic rules to over-approximate potential smells for broad coverage, then employs neural inference to filter false positives. While an LLM can effectively perform this filtering, reliance on LLM APIs introduces high cost and latency, raises data governance concerns, and limits reproducibility and offline deployment. To address the challenges, we adopt a knowledge distillation approach: an LLM teacher generates pseudo-labels to train a compact student model--over 500x smaller--that learns from the teacher's knowledge and efficiently classifies false positives. We evaluate IntelliSA against two static analyzers and three LLM baselines (Claude-4, Grok-4, and GPT-5) using a human-labeled dataset including 241 security smells across 11,814 lines of real-world IaC code. Experimental results show that IntelliSA achieves the highest F1 score (83%), outperforming baselines by 7-42%. Moreover, IntelliSA demonstrates the best cost-effectiveness, detecting 60% of security smells while inspecting less than 2% of the codebase.