🤖 AI Summary
This work addresses the challenge of erroneous executions in AIoT-enabled smart homes caused by entity hallucinations in large language models, a problem often exacerbated by existing approaches that either over-query users or execute commands blindly. To resolve this, the authors propose a two-stage intent-aware framework: the first stage employs a semantic firewall to filter out invalid or ambiguous instructions, while the second stage implements a deterministic cascaded verifier that sequentially validates room, device, and capability constraints to ensure physical executability. By decoupling intent understanding from physical execution and integrating state-aware reasoning with rule-driven verification, the method precisely identifies irreducible ambiguities while minimizing user disruption. Experimental results on the HomeBench and SAGE benchmarks demonstrate an Exact Match accuracy of 58.56%, an invalid instruction rejection rate of 87.04%, and a significant improvement in autonomous task success rate from 42.86% to 71.43%.
📝 Abstract
As Large Language Models (LLMs) transition from information providers to embodied agents in the Internet of Things (IoT), they face significant challenges regarding reliability and interaction efficiency. Direct execution of LLM-generated commands often leads to entity hallucinations (e.g., trying to control non-existent devices). Meanwhile, existing iterative frameworks (e.g., SAGE) suffer from the Interaction Frequency Dilemma, oscillating between reckless execution and excessive user questioning. To address these issues, we propose a Dual-Stage Intent-Aware (DS-IA) Framework. This framework separates high-level user intent understanding from low-level physical execution. Specifically, Stage 1 serves as a semantic firewall to filter out invalid instructions and resolve vague commands by checking the current state of the home. Stage 2 then employs a deterministic cascade verifier-a strict, step-by-step rule checker that verifies the room, device, and capability in sequence-to ensure the action is actually physically possible before execution. Extensive experiments on the HomeBench and SAGE benchmarks demonstrate that DS-IA achieves an Exact Match (EM) rate of 58.56% (outperforming baselines by over 28%) and improves the rejection rate of invalid instructions to 87.04%. Evaluations on the SAGE benchmark further reveal that DS-IA resolves the Interaction Frequency Dilemma by balancing proactive querying with state-based inference. Specifically, it boosts the Autonomous Success Rate (resolving tasks without unnecessary user intervention) from 42.86% to 71.43%, while maintaining high precision in identifying irreducible ambiguities that truly necessitate human clarification. These results underscore the framework's ability to minimize user disturbance through accurate environmental grounding.