AstroReason-Bench: Evaluating Unified Agentic Planning across Heterogeneous Space Planning Problems

📅 2026-01-16

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Existing general-purpose agents struggle with real-world spatial planning tasks characterized by heterogeneous objectives, stringent physical constraints, and long-horizon decision-making. This work proposes the first unified evaluation benchmark tailored to such high-stakes, tightly constrained spatial scheduling problems—exemplified by ground station communication and agile Earth observation—introducing a standardized agent interaction protocol and a multi-scenario simulation environment. Building upon large language models, we develop a planning framework to operate within this benchmark. Experimental results demonstrate that both open-source and proprietary general-purpose agents significantly underperform compared to specialized solvers, revealing fundamental limitations in their ability to reason under realistic physical constraints. This benchmark addresses a critical gap in evaluating agent capabilities on complex, real-world planning tasks.

Technology Category

Application Category

📝 Abstract

Recent advances in agentic Large Language Models (LLMs) have positioned them as generalist planners capable of reasoning and acting across diverse tasks. However, existing agent benchmarks largely focus on symbolic or weakly grounded environments, leaving their performance in physics-constrained real-world domains underexplored. We introduce AstroReason-Bench, a comprehensive benchmark for evaluating agentic planning in Space Planning Problems (SPP), a family of high-stakes problems with heterogeneous objectives, strict physical constraints, and long-horizon decision-making. AstroReason-Bench integrates multiple scheduling regimes, including ground station communication and agile Earth observation, and provides a unified agent-oriented interaction protocol. Evaluating on a range of state-of-the-art open- and closed-source agentic LLM systems, we find that current agents substantially underperform specialized solvers, highlighting key limitations of generalist planning under realistic constraints. AstroReason-Bench offers a challenging and diagnostic testbed for future agentic research.

Problem

Research questions and friction points this paper is trying to address.

agentic planning

Space Planning Problems

physical constraints

heterogeneous objectives

long-horizon decision-making

Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic planning

space planning problems

physics-constrained reasoning