Vision-Based Runtime Monitoring under Varying Specifications using Semantic Latent Representations

📅 2026-05-13

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

This work proposes a reusable runtime monitoring approach for the certified verification of past-time Signal Temporal Logic (ptSTL) specifications from visual observations in partially observable environments, offering finite-sample guarantees. The key innovation lies in introducing a semantic basis—defined as a vector of atomic robustness scores—as a unified prediction target, coupled with a rolling prediction mechanism that online reconstructs temporal histories. A single conformal calibration suffices to cover entire logical fragments, eliminating the need for formula-specific retraining. The semantic-basis monitor achieves up to fourfold higher accuracy over long horizons, while the rolling-prediction monitor yields tighter bounds in short-term settings. Evaluated on both a pedestrian-crossing benchmark and real-world Waymo driving data, both monitors empirically satisfy conformal coverage and deliver superior certified performance across varying time scales.

📝 Abstract

We study certified runtime monitoring of past-time signal temporal logic (ptSTL) from visual observations under partial observability. The monitor must infer safety-relevant quantities from images and provide finite-sample guarantees, while being \emph{reusable}: once trained and calibrated, it should certify any formula in a target fragment without per-formula retraining. For fragments induced by a finite dictionary of temporal atoms, we prove that the \emph{semantic basis}, the vector of atom robustness scores, is the minimum prediction target within the class of monotone, 1-Lipschitz reusable interfaces: any formula is evaluated by a deterministic decoder derived from the parse tree, and a single conformal calibration pass certifies the entire fragment with no union bound. We also introduce a \emph{rolling prediction monitor} that predicts only current predicate values and reconstructs temporal history online; this is easier to learn but grows conservative at long horizons. On a pedestrian-crossroad benchmark, rolling achieves tighter certified bounds at short horizons while the semantic-basis monitor is up to 4-times tighter at long horizons. We validate the presented monitors on real-world Waymo driving data, where both monitors satisfy the conformal coverage guarantee empirically.

Problem

Research questions and friction points this paper is trying to address.

runtime monitoring

signal temporal logic

partial observability

visual observations

conformal prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

semantic latent representations

runtime monitoring

signal temporal logic