A Context Engineering Framework for Improving Enterprise AI Agents based on Digital-Twin MDP

📅 2026-03-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses key challenges in deploying enterprise AI agents—namely, low-quality or scarce data, complex reasoning requirements, difficulties in self-play, and the absence of reliable feedback. To overcome these limitations, we propose a lightweight, model-agnostic offline reinforcement learning framework that innovatively integrates a digital twin Markov decision process (DT-MDP) with contrastive inverse reinforcement learning. This approach efficiently recovers reward functions from trajectories of mixed quality and leverages them to optimize contextual prompting for large language model (LLM) agents, thereby enhancing decision-making. Evaluated on enterprise IT automation tasks, our method consistently and significantly outperforms existing baselines across multiple evaluation settings.

Technology Category

Application Category

📝 Abstract
Despite rapid progress in AI agents for enterprise automation and decision-making, their real-world deployment and further performance gains remain constrained by limited data quality and quantity, complex real-world reasoning demands, difficulties with self-play, and the lack of reliable feedback signals. To address these challenges, we propose a lightweight, model-agnostic framework for improving LLM-based enterprise agents via offline reinforcement learning (RL). The proposed Context Engineering via DT-MDP (DT-MDP-CE) framework comprises three key components: (1) A Digital-Twin Markov Decision Process (DT-MDP), which abstracts the agent's reasoning behavior as a finite MDP; (2) A robust contrastive inverse RL, which, armed with the DT-MDP, to efficiently estimate a well-founded reward function and induces policies from mixed-quality offline trajectories; and (3) RL-guided context engineering, which uses the policy obtained from the integrated process of (1) and (2), to improve the agent's decision-making behavior. As a case study, we apply the framework to a representative task in the enterprise-oriented domain of IT automation. Extensive experimental results demonstrate consistent and significant improvements over baseline agents across a wide range of evaluation settings, suggesting that the framework can generalize to other agents sharing similar characteristics in enterprise environments.
Problem

Research questions and friction points this paper is trying to address.

enterprise AI agents
data quality
real-world reasoning
reliable feedback
offline reinforcement learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Digital-Twin MDP
Offline Reinforcement Learning
Context Engineering
Inverse Reinforcement Learning
LLM-based Agents
🔎 Similar Papers
No similar papers found.
X
Xi Yang
IBM Software Innovation Lab, Yorktown Heights, New York, USA
A
Aurélie Lozano
IBM Software Innovation Lab, Yorktown Heights, New York, USA
N
Naoki Abe
IBM Software Innovation Lab, Yorktown Heights, New York, USA
Bhavya
Bhavya
Research Scientist, IBM Research
AINatural Language ProcessingText Mining
Saurabh Jha
Saurabh Jha
Sr. Research Scientist, IBM
ML for SystemsSystems for MLReliability
N
Noah Zheutlin
IBM Software Innovation Lab, Yorktown Heights, New York, USA
R
Rohan R. Arora
IBM Software Innovation Lab, Yorktown Heights, New York, USA
Y
Yu Deng
IBM Software Innovation Lab, Yorktown Heights, New York, USA
D
Daby M. Sow
IBM Software Innovation Lab, Yorktown Heights, New York, USA