🤖 AI Summary
Hardware vulnerabilities pose persistent, unpatchable risks with long-term implications; however, current analyses rely heavily on subjective expert assessments (e.g., MITRE CWE-MIHW 2021), lacking statistical rigor and scalability. To address this, we propose the first data-driven framework for hardware vulnerability discovery, built upon an LLM-augmented hybrid platform integrating zero-shot large language model classification (LLaMA 3.3 70B), contextualized embeddings, unsupervised clustering, and prompt-driven summarization. Our method automatically identifies hardware-related weaknesses and distills actionable knowledge from CVE records. Applied to 114,000 CVE entries, it precisely detects 1,742 hardware vulnerabilities and synthesizes five recurring themes—including firmware privilege escalation and memory corruption. The framework directly supports the MITRE MIHW 2025 update, contributing 411 critical CVEs with a classification accuracy of 99.5%, markedly enhancing both analytical objectivity and efficiency.
📝 Abstract
The rapid growth of hardware vulnerabilities has created an urgent need for systematic and scalable analysis methods. Unlike software flaws, which are often patchable post-deployment, hardware weaknesses remain embedded across product lifecycles, posing persistent risks to processors, embedded devices, and IoT platforms. Existing efforts such as the MITRE CWE Hardware List (2021) relied on expert-driven Delphi surveys, which lack statistical rigor and introduce subjective bias, while large-scale data-driven foundations for hardware weaknesses have been largely absent. In this work, we propose LLM-HyPZ, an LLM-assisted hybrid framework for zero-shot knowledge extraction and refinement from vulnerability corpora. Our approach integrates zero-shot LLM classification, contextualized embeddings, unsupervised clustering, and prompt-driven summarization to mine hardware-related CVEs at scale. Applying LLM-HyPZ to the 2021-2024 CVE corpus (114,836 entries), we identified 1,742 hardware-related vulnerabilities. We distilled them into five recurring themes, including privilege escalation via firmware and BIOS, memory corruption in mobile and IoT systems, and physical access exploits. Benchmarking across seven LLMs shows that LLaMA 3.3 70B achieves near-perfect classification accuracy (99.5%) on a curated validation set. Beyond methodological contributions, our framework directly supported the MITRE CWE Most Important Hardware Weaknesses (MIHW) 2025 update by narrowing the candidate search space. Specifically, our pipeline surfaced 411 of the 1,026 CVEs used for downstream MIHW analysis, thereby reducing expert workload and accelerating evidence gathering. These results establish LLM-HyPZ as the first data-driven, scalable approach for systematically discovering hardware vulnerabilities, thereby bridging the gap between expert knowledge and real-world vulnerability evidence.