🤖 AI Summary
Formal specification of embodied agent behaviors remains challenging due to the difficulty of rigorously encoding perception-driven, dynamic actions within traditional logical frameworks. Method: This paper introduces Embedding Temporal Logic (ETL), the first formal specification language that directly integrates semantic embeddings from pretrained vision and multimodal foundation models. ETL defines behavioral properties as distances between ideal behavior representations and actual observed representations in the embedding space, thereby overcoming limitations of classical logics in capturing perceptually grounded dynamics. The approach unifies embedding-guided specification, formal verification, robot planning, and foundation-model-based control. Contribution/Results: Evaluated on large-model-driven robotic tasks, ETL significantly enhances behavioral interpretability and controllability, enabling precise modeling and targeted steering of desired behaviors while preserving formal guarantees.
📝 Abstract
We propose an approach to formally specifying the behavioral properties of systems that rely on a perception model for interactions with the physical world. The key idea is to introduce embeddings -- mathematical representations of a real-world concept -- as a first-class construct in a specification language, where properties are expressed in terms of distances between a pair of ideal and observed embeddings. To realize this approach, we propose a new type of temporal logic called Embedding Temporal Logic (ETL), and describe how it can be used to express a wider range of properties about AI-enabled systems than previously possible. We demonstrate the applicability of ETL through a preliminary evaluation involving planning tasks in robots that are driven by foundation models; the results are promising, showing that embedding-based specifications can be used to steer a system towards desirable behaviors.