🤖 AI Summary
Active learning of Mealy machines under stochastic input–output delays remains challenging due to inefficient sampling strategies and inability to distinguish systems with identical I/O behavior but distinct delay distributions.
Method: We propose a decoupled framework that separates behavioral learning from delay estimation. Leveraging the structural properties of the L* algorithm, we dynamically optimize delay-aware sampling sequences during inference—eliminating redundant queries caused by repeated root-state sampling. Concurrently, we integrate statistical delay estimation techniques to identify and model heterogeneous delay distributions even when input–output traces are indistinguishable.
Contribution/Results: Our approach significantly improves both query efficiency and delay modeling accuracy. Empirical evaluation across multiple benchmarks demonstrates consistent superiority over naive baselines. Moreover, it successfully supports real-world, latency-sensitive applications—such as join-order analysis in relational databases—where precise delay characterization is critical for performance prediction and optimization.
📝 Abstract
This paper studies active automata learning (AAL) in the presence of stochastic delays. We consider Mealy machines that have stochastic delays associated with each transition and explore how the learner can efficiently arrive at faithful estimates of those machines, the precision of which crucially relies on repetitive sampling of transition delays. While it is possible to naïvely integrate the delay sampling into AAL algorithms such as $L^*$, this leads to considerable oversampling near the root of the state space. We address this problem by separating conceptually the learning of behavior and delays such that the learner uses the information gained while learning the logical behavior to arrive at efficient input sequences for collecting the needed delay samples. We put emphasis on treating cases in which identical input/output behaviors might stem from distinct delay characteristics. Finally, we provide empirical evidence that our method outperforms the naïve baseline across a wide range of benchmarks and investigate its applicability in a realistic setting by studying the join order in a relational database.