Compositional Automata Embeddings for Goal-Conditioned Reinforcement Learning

📅 2024-10-31

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Temporal goal representations in goal-conditioned reinforcement learning suffer from ambiguity, poor generalization, and lack of formal semantic rigor. Method: We propose compositional Deterministic Finite Automata (cDFA) as a formal, interpretable temporal goal representation. We introduce the novel “reach-avoid derivation” method for cDFA construction and design a graph neural network–based cDFA embedding pretraining framework, enabling zero-shot transfer and rapid, non-hierarchical policy specialization. Contribution/Results: This work is the first to deeply integrate cDFA into goal-conditioned RL, ensuring strict temporal semantics while preserving human interpretability. Our framework demonstrates strong zero-shot generalization across diverse, complex cDFA-specified goals, significantly accelerating policy fine-tuning. It overcomes inherent myopia and suboptimality in hierarchical approaches by eliminating explicit temporal abstraction and layering constraints.

Technology Category

Application Category

📝 Abstract

Goal-conditioned reinforcement learning is a powerful way to control an AI agent's behavior at runtime. That said, popular goal representations, e.g., target states or natural language, are either limited to Markovian tasks or rely on ambiguous task semantics. We propose representing temporal goals using compositions of deterministic finite automata (cDFAs) and use cDFAs to guide RL agents. cDFAs balance the need for formal temporal semantics with ease of interpretation: if one can understand a flow chart, one can understand a cDFA. On the other hand, cDFAs form a countably infinite concept class with Boolean semantics, and subtle changes to the automaton can result in very different tasks, making them difficult to condition agent behavior on. To address this, we observe that all paths through a DFA correspond to a series of reach-avoid tasks and propose pre-training graph neural network embeddings on"reach-avoid derived"DFAs. Through empirical evaluation, we demonstrate that the proposed pre-training method enables zero-shot generalization to various cDFA task classes and accelerated policy specialization without the myopic suboptimality of hierarchical methods.

Problem

Research questions and friction points this paper is trying to address.

Non-Markovian Representation

Semantic Ambiguity

Formal Temporal Semantics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Compositional Deterministic Finite Automata

Graph Neural Network Embeddings

Zero-shot Generalization

🔎 Similar Papers

No similar papers found.