🤖 AI Summary
Conventional object recognition methods struggle to rapidly adapt to unseen objects without retraining and are constrained by predefined object categories.
Method: We propose a dynamic metric learning framework that generalizes zero-shot to novel objects without training or a fixed category set. By formulating object recognition as a few-shot deep metric learning task, our approach achieves zero-shot recognition of new objects from a single observed trajectory. It employs a Siamese network architecture augmented with RNNs to explicitly model behavioral trajectories and learn an object-centric embedding space.
Contribution/Results: Evaluated across multiple environments, the method significantly reduces adaptation latency and accelerates inference while matching state-of-the-art accuracy. Crucially, it exhibits strong generalization to previously unencountered objects and enables flexible deployment—eliminating dependence on both retraining and static object taxonomies.
📝 Abstract
Goal Recognition (GR) is the problem of recognizing an agent's objectives based on observed actions. Recent data-driven approaches for GR alleviate the need for costly, manually crafted domain models. However, these approaches can only reason about a pre-defined set of goals, and time-consuming training is needed for new emerging goals. To keep this model-learning automated while enabling quick adaptation to new goals, this paper introduces GRAML: Goal Recognition As Metric Learning. GRAML uses a Siamese network to treat GR as a deep metric learning task, employing an RNN that learns a metric over an embedding space, where the embeddings for observation traces leading to different goals are distant, and embeddings of traces leading to the same goals are close. This metric is especially useful when adapting to new goals, even if given just one example observation trace per goal. Evaluated on a versatile set of environments, GRAML shows speed, flexibility, and runtime improvements over the state-of-the-art GR while maintaining accurate recognition.