CAX-Agent: A Lightweight Agent Harness for Reliable APDL Automation

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This study addresses the instability and high failure rates of large language models (LLMs) in MAPDL-based finite element simulation, stemming from a lack of structured control, tool encapsulation, and fault tolerance. To overcome these limitations, the authors propose CAX-Agent, a lightweight agent middleware featuring a three-tier architecture—comprising an LLM service layer, an agent middleware layer, and a solver backend—that enables reliable task orchestration. A novel recovery-ladder fault-tolerance mechanism is introduced, progressively escalating from rule-based repair and model regeneration to context enhancement and, if necessary, human intervention. Evaluated on 50 structural benchmark cases, the model-only strategy achieves a 92.67% task completion rate, an average score of 3.59 out of 4, and an 84% zero-intervention rate, significantly outperforming baseline approaches (Cliff’s delta = 0.81–0.87), thereby demonstrating the robustness and effectiveness of the proposed method.

📝 Abstract

Large language models deployed for MAPDL finite-element simulation face practical reliability challenges: without structured execution control, tool encapsulation, and fault recovery, outputs may be inconsistent and task failures are common. The Agent Harness paradigm addresses this by inserting domain-specific orchestration middleware that manages tool lifecycles, workflow state, and recovery escalation. This paper presents the architecture of CAX-Agent, a lightweight agent harness purpose-built for MAPDL automation, and empirically evaluates one of its core components -- the recovery policy.CAX-Agent organizes execution into three layers -- LLM service, agent harness, and solver backend -- with a recovery ladder that escalates from deterministic rule patching through model-driven regeneration to context enrichment and human intervention. We evaluate three recovery strategies (no_recovery, rule_only, and model_only) on 50 standard structural benchmarks with three repeated runs per strategy (450 case-runs total). Two independent human raters score task completion under blind conditions; inter-rater agreement is strong (quadratic weighted Cohen's kappa = 0.84, 96 percent of score pairs within one point). Model_only achieves the best completion rate (0.9267), task score (3.59/4), total score (9.16/10), and zero-intervention rate (0.84), outperforming rule_only (0.7733, 3.17/4, 7.03/10, 0.00) and no_recovery (0.6933, 2.74/4, 5.60/10, 0.00) with large effect sizes (Cliff's delta = 0.81-0.87). The benchmark uses deliberately simple geometries to isolate recovery-policy effects; we discuss the scope of these findings and directions for broader validation.

Problem

Research questions and friction points this paper is trying to address.

LLM reliability

APDL automation

fault recovery

agent harness

finite-element simulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Agent Harness

Recovery Policy

MAPDL Automation