Detecting Non-Optimal Decisions of Embodied Agents via Diversity-Guided Metamorphic Testing

📅 2025-12-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the critical yet overlooked problem of “Non-optimal but Successful Planning” (NoD)—where embodied agents accomplish tasks successfully yet generate suboptimal plans—under resource-constrained settings, a flaw neglected by existing evaluation benchmarks. We formally define the NoD phenomenon and propose NoD-DGMT, a diversity-guided mutation testing framework. Methodologically, we introduce four novel mutation relations specifically designed to assess planning optimality, construct a behavioral invariance model to distinguish functional correctness from optimality violations, and integrate a diversity-driven test case selection strategy to balance coverage and detection efficiency. Evaluated on four state-of-the-art planning models in the AI2-THOR environment, NoD-DGMT achieves an average detection rate of 31.9%, outperforming the optimal baseline by 16.8 percentage points; the diversity-guidance mechanism further improves detection rate by 4.3 points and diversity score by 3.3 points.

Technology Category

Application Category

📝 Abstract
As embodied agents advance toward real-world deployment, ensuring optimal decisions becomes critical for resource-constrained applications. Current evaluation methods focus primarily on functional correctness, overlooking the non-functional optimality of generated plans. This gap can lead to significant performance degradation and resource waste. We identify and formalize the problem of Non-optimal Decisions (NoDs), where agents complete tasks successfully but inefficiently. We present NoD-DGMT, a systematic framework for detecting NoDs in embodied agent task planning via diversity-guided metamorphic testing. Our key insight is that optimal planners should exhibit invariant behavioral properties under specific transformations. We design four novel metamorphic relations capturing fundamental optimality properties: position detour suboptimality, action optimality completeness, condition refinement monotonicity, and scene perturbation invariance. To maximize detection efficiency, we introduce a diversity-guided selection strategy that actively selects test cases exploring different violation categories, avoiding redundant evaluations while ensuring comprehensive diversity coverage. Extensive experiments on the AI2-THOR simulator with four state-of-the-art planning models demonstrate that NoD-DGMT achieves violation detection rates of 31.9% on average, with our diversity-guided filter improving rates by 4.3% and diversity scores by 3.3 on average. NoD-DGMT significantly outperforms six baseline methods, with 16.8% relative improvement over the best baseline, and demonstrates consistent superiority across different model architectures and task complexities.
Problem

Research questions and friction points this paper is trying to address.

Detects non-optimal decisions in embodied agents' task planning
Uses diversity-guided metamorphic testing to identify inefficiencies
Focuses on invariant behavioral properties under specific transformations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diversity-guided metamorphic testing for detecting non-optimal decisions
Four novel metamorphic relations capturing optimality properties
Active selection strategy maximizing test case diversity coverage
🔎 Similar Papers
No similar papers found.
W
Wenzhao Wu
National Supercomputing Center, Wuxi, Jiangsu 214072, P. R. China
Y
Yahui Tang
School of Computer, Chongqing University of Posts and Telecommunications, Chongqing, 400065, P. R. China
Mingfei Cheng
Mingfei Cheng
Singapore Management University
Software EngineeringSoftware TestingAutonomous DrivingAI System
W
Wenbing Tang
College of Information Engineering, Northwest A&F University, Yangling, Shaanxi 712100, P. R. China
Y
Yuan Zhou
School of Computer Science and Technology, Zhejiang Sci-Tech University, Hangzhou, Zhejiang 310018, P. R. China
Y
Yang Liu
College of Computing and Data Science, Nanyang Technological University, Singapore 639798