A Context Engineering Framework for Improving Enterprise AI Agents based on Digital-Twin MDP

📅 2026-03-23

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

This work addresses key challenges in deploying enterprise AI agents—namely, low-quality or scarce data, complex reasoning requirements, difficulties in self-play, and the absence of reliable feedback. To overcome these limitations, we propose a lightweight, model-agnostic offline reinforcement learning framework that innovatively integrates a digital twin Markov decision process (DT-MDP) with contrastive inverse reinforcement learning. This approach efficiently recovers reward functions from trajectories of mixed quality and leverages them to optimize contextual prompting for large language model (LLM) agents, thereby enhancing decision-making. Evaluated on enterprise IT automation tasks, our method consistently and significantly outperforms existing baselines across multiple evaluation settings.

Technology Category

Application Category

📝 Abstract

Despite rapid progress in AI agents for enterprise automation and decision-making, their real-world deployment and further performance gains remain constrained by limited data quality and quantity, complex real-world reasoning demands, difficulties with self-play, and the lack of reliable feedback signals. To address these challenges, we propose a lightweight, model-agnostic framework for improving LLM-based enterprise agents via offline reinforcement learning (RL). The proposed Context Engineering via DT-MDP (DT-MDP-CE) framework comprises three key components: (1) A Digital-Twin Markov Decision Process (DT-MDP), which abstracts the agent's reasoning behavior as a finite MDP; (2) A robust contrastive inverse RL, which, armed with the DT-MDP, to efficiently estimate a well-founded reward function and induces policies from mixed-quality offline trajectories; and (3) RL-guided context engineering, which uses the policy obtained from the integrated process of (1) and (2), to improve the agent's decision-making behavior. As a case study, we apply the framework to a representative task in the enterprise-oriented domain of IT automation. Extensive experimental results demonstrate consistent and significant improvements over baseline agents across a wide range of evaluation settings, suggesting that the framework can generalize to other agents sharing similar characteristics in enterprise environments.

Problem

Research questions and friction points this paper is trying to address.

enterprise AI agents

data quality

real-world reasoning

reliable feedback

offline reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Digital-Twin MDP

Offline Reinforcement Learning

Context Engineering