Collaborative Agent Reasoning Engineering (CARE): A Three-Party Design Methodology for Systematically Engineering AI Agents with Subject Matter Experts, Developers, and Helper Agents

📅 2026-04-30

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work addresses the lack of systematic methodologies in developing large language model (LLM) agents for scientific domains, the misalignment between domain experts and developers in understanding constraints, and the uneven capabilities arising from LLMs’ “jagged technological frontier.” To tackle these challenges, the paper proposes a tripartite, stage-gated agent engineering paradigm that integrates structured requirement templates, tool orchestration mechanisms, and multi-stage validation gates. This approach facilitates close collaboration among domain experts, developers, and auxiliary agents to transform informal intents into auditable, testable, and maintainable agent specifications. Empirical evaluation in scientific application scenarios demonstrates substantial improvements in both development efficiency and complex query performance, thereby validating the effectiveness of the proposed framework in enhancing agent specifiability, testability, and maintainability.

📝 Abstract

We present Collaborative Agent Reasoning Engineering (CARE), a disciplined methodology for engineering Large Language Model (LLM) agents in scientific domains. Unlike ad-hoc trial-and-error approaches, CARE specifies behavior, grounding, tool orchestration, and verification through reusable artifacts and systematic, stage-gated phases. The methodology employs a three-party workflow involving Subject-Matter Experts (SMEs), developers, and LLM-based helper agents. These helper agents function as facilitation infrastructure, transforming informal domain intent into structured, reviewable specifications for human approval at defined gates. CARE addresses the "jagged technological frontier", characterized by uneven LLM performance, by bridging the gap between novice and expert analysts regarding domain constraints and verification practices. By generating concrete artifacts, including interaction requirements, reasoning policies, and evaluation criteria, CARE ensures agent behavior is specifiable, testable, and maintainable. Evaluation results from a scientific use case demonstrate that this stage-gated, artifact-driven methodology yields measurable improvements in development efficiency and complex-query performance.

Problem

Research questions and friction points this paper is trying to address.

Collaborative Agent Reasoning Engineering

Large Language Model agents

systematic engineering

jagged technological frontier

domain constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

Collaborative Agent Reasoning Engineering

three-party workflow

stage-gated methodology