Learning Correct Behavior from Examples: Validating Sequential Execution in Autonomous Agents

📅 2026-05-04

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses the challenge of verifying sequential behaviors of autonomous agents, which traditionally relies on handcrafted rules or large datasets. The authors propose a method that automatically learns a behavioral model from only 2–10 correct execution traces by integrating dominance analysis with the semantic reasoning capabilities of multimodal large language models. A universal ground-truth model is constructed using a prefix-tree acceptor, enabling efficient verification through multi-level equivalence checking and topological subsequence matching. With as few as three training traces, the approach achieves high precision in detecting product defects and false successes, supports nondeterministic scenarios, and yields interpretable results alongside coverage metrics. The framework is broadly applicable across domains such as UI testing, code generation, and robotic process automation.

📝 Abstract

As autonomous agents become increasingly sophisticated, validating their sequential behavior presents a significant challenge. Traditional testing approaches require manual specification, exact sequence matching, or thousands of training examples. We present a novel algorithm that automatically learns correct behavior from just 2-10 passing execution traces and validates new executions against this learned model. Our approach combines dominator analysis from compiler theory with multimodal large language model-powered semantic understanding to identify essential states and handle non-deterministic behavior. The system constructs a generalized ground truth model using Prefix Tree Acceptors, merges traces through multi-tiered equivalence detection, and validates new executions via topological subsequence matching. In controlled experiments, our system achieved high accuracy in detecting product bugs and false successes using only 3 training traces. This approach provides explainable validation results with coverage metrics and works across diverse domains including UI testing, code generation, and robotic processes.

Problem

Research questions and friction points this paper is trying to address.

sequential execution validation

autonomous agents

behavior learning

execution traces

correctness validation

Innovation

Methods, ideas, or system contributions that make the work stand out.

dominator analysis

multimodal large language model

Prefix Tree Acceptor