🤖 AI Summary
This work proposes a reusable runtime monitoring approach for the certified verification of past-time Signal Temporal Logic (ptSTL) specifications from visual observations in partially observable environments, offering finite-sample guarantees. The key innovation lies in introducing a semantic basis—defined as a vector of atomic robustness scores—as a unified prediction target, coupled with a rolling prediction mechanism that online reconstructs temporal histories. A single conformal calibration suffices to cover entire logical fragments, eliminating the need for formula-specific retraining. The semantic-basis monitor achieves up to fourfold higher accuracy over long horizons, while the rolling-prediction monitor yields tighter bounds in short-term settings. Evaluated on both a pedestrian-crossing benchmark and real-world Waymo driving data, both monitors empirically satisfy conformal coverage and deliver superior certified performance across varying time scales.
📝 Abstract
We study certified runtime monitoring of past-time signal temporal logic (ptSTL) from visual observations under partial observability. The monitor must infer safety-relevant quantities from images and provide finite-sample guarantees, while being \emph{reusable}: once trained and calibrated, it should certify any formula in a target fragment without per-formula retraining. For fragments induced by a finite dictionary of temporal atoms, we prove that the \emph{semantic basis}, the vector of atom robustness scores, is the minimum prediction target within the class of monotone, 1-Lipschitz reusable interfaces: any formula is evaluated by a deterministic decoder derived from the parse tree, and a single conformal calibration pass certifies the entire fragment with no union bound. We also introduce a \emph{rolling prediction monitor} that predicts only current predicate values and reconstructs temporal history online; this is easier to learn but grows conservative at long horizons. On a pedestrian-crossroad benchmark, rolling achieves tighter certified bounds at short horizons while the semantic-basis monitor is up to 4-times tighter at long horizons. We validate the presented monitors on real-world Waymo driving data, where both monitors satisfy the conformal coverage guarantee empirically.