🤖 AI Summary
This work addresses the limited reliability of existing agent systems in complex scientific visualization workflows, where they often execute invalid operations, introduce subtle errors, or fail to handle requests with missing information. To overcome these challenges, we propose a highly reliable conversational automation framework tailored for topological data analysis and visualization. The framework innovatively employs a dual-agent architecture—comprising a coordinator and a validator—that decouples workflow generation from verification to ensure both structural validity and semantic consistency. It incorporates a failure mode taxonomy with targeted safeguards and leverages modular backend interfaces to enable flexible extension without modifying the core system. Evaluated on a benchmark of 100 prompts and 1,000 multi-turn dialogues—including adversarial and infeasible requests—the framework achieves over 99% success rate, substantially outperforming baseline approaches lacking comprehensive protective mechanisms (<50%).
📝 Abstract
Recent agentic systems demonstrate that large language models can generate scientific visualizations from natural language. However, reliability remains a major limitation: systems may execute invalid operations, introduce subtle but consequential errors, or fail to request missing information when inputs are underspecified. These issues are amplified in real-world workflows, which often exceed the complexity of standard benchmarks. Ensuring reliability in autonomous visualization pipelines therefore remains an open challenge. We present TopoPilot, a reliable and extensible agentic framework for automating complex scientific visualization workflows. TopoPilot incorporates systematic guardrails and verification mechanisms to ensure reliable operation. While we focus on topological data analysis and visualization as a primary use case, the framework is designed to generalize across visualization domains. TopoPilot adopts a reliability-centered two-agent architecture. An orchestrator agent translates user prompts into workflows composed of atomic backend actions, while a verifier agent evaluates these workflows prior to execution, enforcing structural validity and semantic consistency. This separation of interpretation and verification reduces code-generation errors and enforces correctness guarantees. A modular architecture further improves robustness by isolating components and enabling seamless integration of new descriptors and domain-specific workflows without modifying the core system. To systematically address reliability, we introduce a taxonomy of failure modes and implement targeted safeguards for each class. In evaluations simulating 1,000 multi-turn conversations across 100 prompts, including adversarial and infeasible requests, TopoPilot achieves a success rate exceeding 99%, compared to under 50% for baselines without comprehensive guardrails and checks.