LLM-based Multi-class Attack Analysis and Mitigation Framework in IoT/IIoT Networks

📅 2025-10-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The absence of standardized, quantitative AI security evaluation benchmarks for IoT/IIoT hinders rigorous comparison of model effectiveness. Method: This paper proposes the first end-to-end attack analysis and mitigation framework integrating machine learning and large language models (LLMs), featuring structured role-based prompt engineering, retrieval-augmented generation (RAG), and a multi-judge LLM collaborative assessment mechanism. Contribution/Results: It establishes the first quantitative evaluation metric system targeting attack identification, behavioral analysis, and mitigation recommendation. Experiments on Edge-IIoTset and CICIoT2023 demonstrate that Random Forest achieves optimal detection accuracy, while ChatGPT-o3 significantly outperforms baseline LLMs (e.g., DeepSeek) in analytical depth and recommendation quality. This work introduces a reproducible, scalable paradigm for AI-driven IIoT security assessment.

Technology Category

Application Category

📝 Abstract
The Internet of Things has expanded rapidly, transforming communication and operations across industries but also increasing the attack surface and security breaches. Artificial Intelligence plays a key role in securing IoT, enabling attack detection, attack behavior analysis, and mitigation suggestion. Despite advancements, evaluations remain purely qualitative, and the lack of a standardized, objective benchmark for quantitatively measuring AI-based attack analysis and mitigation hinders consistent assessment of model effectiveness. In this work, we propose a hybrid framework combining Machine Learning (ML) for multi-class attack detection with Large Language Models (LLMs) for attack behavior analysis and mitigation suggestion. After benchmarking several ML and Deep Learning (DL) classifiers on the Edge-IIoTset and CICIoT2023 datasets, we applied structured role-play prompt engineering with Retrieval-Augmented Generation (RAG) to guide ChatGPT-o3 and DeepSeek-R1 in producing detailed, context-aware responses. We introduce novel evaluation metrics for quantitative assessment to guide us and an ensemble of judge LLMs, namely ChatGPT-4o, DeepSeek-V3, Mixtral 8x7B Instruct, Gemini 2.5 Flash, Meta Llama 4, TII Falcon H1 34B Instruct, xAI Grok 3, and Claude 4 Sonnet, to independently evaluate the responses. Results show that Random Forest has the best detection model, and ChatGPT-o3 outperformed DeepSeek-R1 in attack analysis and mitigation.
Problem

Research questions and friction points this paper is trying to address.

Lack of standardized benchmark for AI-based IoT attack analysis
Need quantitative metrics to assess model effectiveness consistently
Require hybrid framework combining ML detection with LLM analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid ML-LLM framework for IoT attack detection
Structured role-play prompts with RAG enhancement
Novel metrics and ensemble LLM evaluation system
🔎 Similar Papers
No similar papers found.