🤖 AI Summary
This work addresses the limitations of existing black-box reinforcement learning methods in graph-structured temporal tasks, which often neglect the propagation mechanisms of local perturbations through the network, resulting in low sample efficiency and uninterpretable policies. To overcome this, the authors propose GTL-CIRL, a novel closed-loop framework that integrates Causal Graph Temporal Logic (Causal GTL) with reinforcement learning for the first time. The approach jointly learns policies and formally verifiable causal specifications through counterexample-guided reward shaping and robustness constraints, while leveraging Gaussian process-driven Bayesian optimization to fine-tune logical template parameters. By explicitly modeling spatiotemporal dependencies, GTL-CIRL significantly enhances exploration efficiency. Experiments on gene regulatory and power grid tasks demonstrate that the method not only accelerates convergence compared to baselines but also yields clear, interpretable, and formally verifiable behavioral policies.
📝 Abstract
Decision-making tasks often unfold on graphs with spatial-temporal dynamics. Black-box reinforcement learning often overlooks how local changes spread through network structure, limiting sample efficiency and interpretability. We present GTL-CIRL, a closed-loop framework that simultaneously learns policies and mines Causal Graph Temporal Logic (Causal GTL) specifications. The method shapes rewards with robustness, collects counterexamples when effects fail, and uses Gaussian Process (GP) driven Bayesian optimization to refine parameterized cause templates. The GP models capture spatial and temporal correlations in the system dynamics, enabling efficient exploration of complex parameter spaces. Case studies in gene and power networks show faster learning and clearer, verifiable behavior compared to standard RL baselines.