π€ AI Summary
This work addresses the need for autonomous agents in planetary exploration missions to interpret high-level tasks under communication constraints and without access to global positioning. To bridge natural language understanding with formal reasoning, the study proposes a method for translating natural language task descriptions into structured first-order logic (FOL) representations. Leveraging real mission documentation from NASAβs Planetary Data System (2003β2013), the authors construct the first NL-to-FOL translation benchmark tailored to planetary exploration, featuring expert-annotated FOL expressions, a structured predicate vocabulary, and typed constants that capture mission phases, temporal structures, agent roles, and operational dependencies. This benchmark fills a critical gap in integrating language understanding with formal reasoning for safety-critical space missions and enables controlled experiments under varying prior knowledge assumptions, thereby providing foundational resources for mission comprehension and reasoning in autonomous agents.
π Abstract
Future planetary exploration envisions autonomous robotic agents operating under severe communication constraints, without global positioning, and with minimal human intervention. In such environments, agents must not only perceive and act, but also reason over mission objectives, operational constraints, and evolving environmental conditions. While prior work has largely focused on perception and control, the translation of high-level mission knowledge into structured, machine-interpretable representations remains underexplored.
We introduce a pilot benchmark for translating natural language (NL) into First-Order Logic (FOL) within the domain of planetary exploration. The dataset is constructed from real mission documentation sourced from NASA's Planetary Data System (PDS), spanning missions from 2003 to 2013. These documents describe mission phases such as launch, boost, coast, cruise, and orbital operations in rich natural language. We manually annotate these documents with corresponding FOL representations that capture temporal structure, agent roles, and operational dependencies. In addition, we provide structured predicate vocabularies and typed constants to enable controlled experimentation with varying levels of prior knowledge. This pilot benchmark provides a foundation for research at the intersection of language understanding and formal reasoning, grounded in real-world, safety-critical mission data. The dataset is provided at: https://github.com/HaydenMM/planetary-logic-benchmark/blob/main/pilot_benchmark.json