LATMOS: Latent Automaton Task Model from Observation Sequences

📅 2025-03-11

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

In service robot task planning, the lack of coordination among task decomposition, state perception, and execution verification remains a critical challenge. Method: This paper proposes an observation-driven framework for learning implicit finite-state machines (FSMs), uniquely integrating automata theory with latent-space representation learning. A deep multimodal encoder (combining CNNs and Transformers) jointly models images, videos, language, and robot states to enable symbolic latent-space modeling and automatic FSM induction—without requiring labeled state transitions. Contribution/Results: The framework unifies task-structure discovery and execution assurance, yielding interpretable and formally verifiable implicit FSMs. Evaluated on logical tasks, human behavior videos, and real-world robot deployments, it achieves significantly higher planning success rates and verifiability than prior end-to-end and rule-based approaches, demonstrating superior generalization across diverse domains.

Technology Category

Application Category

📝 Abstract

Robot task planning from high-level instructions is an important step towards deploying fully autonomous robot systems in the service sector. Three key aspects of robot task planning present challenges yet to be resolved simultaneously, namely, (i) factorization of complex tasks specifications into simpler executable subtasks, (ii) understanding of the current task state from raw observations, and (iii) planning and verification of task executions. To address these challenges, we propose LATMOS, an automata-inspired task model that, given observations from correct task executions, is able to factorize the task, while supporting verification and planning operations. LATMOS combines an observation encoder to extract the features from potentially high-dimensional observations with automata theory to learn a sequential model that encapsulates an automaton with symbols in the latent feature space. We conduct extensive evaluations in three task model learning setups: (i) abstract tasks described by logical formulas, (ii) real-world human tasks described by videos and natural language prompts and (iii) a robot task described by image and state observations. The results demonstrate the improved plan generation and verification capabilities of LATMOS across observation modalities and tasks.

Problem

Research questions and friction points this paper is trying to address.

Factorize complex tasks into simpler subtasks

Understand task state from raw observations

Plan and verify task executions effectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automata-inspired task model for robot planning

Observation encoder extracts high-dimensional features

Supports task factorization, verification, and planning

🔎 Similar Papers

Chrono: A Simple Blueprint for Representing Time in MLLMs