🤖 AI Summary
This work addresses the challenges of deploying large language models (LLMs) in mission-critical environments such as the Internet of Battlefield Things (IoBT), where safety, reliability, and policy compliance are paramount. The authors propose a Policy-Aware Edge LLM-RAG framework (PA-LLM-RAG) that integrates policy- and telemetry-informed retrieval-augmented reasoning, lightweight local LLM-based task planning, and a dual-instruction verification mechanism powered by an independent JudgeLLM. By synergistically combining deterministic policy constraints with semantic-level instruction validation, the framework effectively blocks non-compliant actions while maintaining low-latency operation. Experimental evaluation in a RoboDK simulation environment demonstrates that a Gemma-2B–based implementation achieves 100% task success rate with an average response latency of 4.17 seconds, confirming its efficacy in balancing real-time performance and strict policy adherence in complex, high-risk scenarios.
📝 Abstract
Large Language Models (LLMs) offer a promising interface for intent-driven control of autonomous cyber-physical systems, but their direct use in mission-critical Internet of Battlefield Things (IoBT) environments raises significant safety, reliability, and policy-compliance concerns. This paper presents a Policy-Aware Large Language Model Retrieval-Augmented Generation (referred as PA-LLM-RAG), an edge-deployed LLM orchestration framework for IoBT mission control that integrates retrieval-augmented reasoning and independent command verification. The proposed PA-LLM-RAG framework combines a lightweight retrieval module that grounds decisions in operational policies and telemetry with a locally hosted LLM for mission planning and a secondary JudgeLLM for validating user generated commands prior to execution. To evaluate PA-LLM-RAG, we implement a simulated IoBT environment using RoboDK and assess four open-source LLMs across controlled mission scenarios of increasing complexity, including baseline operations, threat detection, coverage recovery, multi-event coordination, and policy-violation requests. Experimental results demonstrate that the framework effectively detects policy-violating commands while maintaining low-latency response suitable for edge deployment. Gemma-2B achieving the highest overall reliability with 4.17 sec latency and 100% success rate. The findings highlight a clear tradeoff between reasoning capacity and responsiveness across models and show that combining deterministic safeguards with JudgeLLM verification significantly improves reliability in LLM-driven IoBT orchestration.