Multi-Agent Collaborative Framework for Intelligent IT Operations: An AOI System with Context-Aware Compression and Dynamic Task Scheduling

📅 2025-12-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Dynamic microservice orchestration in cloud-native environments causes operational data explosion, leading to information overload, inefficient multi-task coordination, and fragmented contextual continuity for fault diagnosis. To address these challenges, we propose AOI, a multi-agent collaborative framework featuring a novel three-tier memory architecture—integrating working memory, episodic memory, and semantic memory—and incorporating an LLM-driven, context-aware compression mechanism alongside a real-time state-guided dynamic task prioritization scheduler. This design ensures contextual continuity and adaptive decision-making. Experimental evaluation demonstrates a 72.4% context compression rate with 92.8% critical information retention; task success rate reaches 94.2%, and mean time to repair (MTTR) is reduced by 34.4%. AOI establishes a scalable, interpretable collaborative paradigm for intelligent cloud-native operations.

Technology Category

Application Category

📝 Abstract
The proliferation of cloud-native architectures, characterized by microservices and dynamic orchestration, has rendered modern IT infrastructures exceedingly complex and volatile. This complexity generates overwhelming volumes of operational data, leading to critical bottlenecks in conventional systems: inefficient information processing, poor task coordination, and loss of contextual continuity during fault diagnosis and remediation. To address these challenges, we propose AOI (AI-Oriented Operations), a novel multi-agent collaborative framework that integrates three specialized agents with an LLM-based Context Compressor. Its core innovations include: (1) a dynamic task scheduling strategy that adaptively prioritizes operations based on real-time system states, and (2) a three-layer memory architecture comprising Working, Episodic, and Semantic layers that optimizes context retention and retrieval. Extensive experiments on both synthetic and real-world benchmarks demonstrate that AOI effectively mitigates information overload, achieving a 72.4% context compression ratio while preserving 92.8% of critical information and significantly enhances operational efficiency, attaining a 94.2% task success rate and reducing the Mean Time to Repair (MTTR) by 34.4% compared to the best baseline. This work presents a paradigm shift towards scalable, adaptive, and context-aware autonomous operations, enabling robust management of next-generation IT infrastructures with minimal human intervention.
Problem

Research questions and friction points this paper is trying to address.

Addresses IT operational data overload and processing inefficiencies
Solves poor task coordination and contextual loss in fault management
Mitigates complexity in cloud-native IT infrastructure operations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent framework with LLM-based context compression
Dynamic task scheduling based on real-time system states
Three-layer memory architecture for optimized context retention
🔎 Similar Papers
No similar papers found.
Z
Zishan Bai
Columbia University, New York, NY , USA
E
Enze Ge
University of Bologna, Bologna, Italy
Junfeng Hao
Junfeng Hao
广东医科大学附属医院 血液透析中心 主任医师
肾病 血液透析 血透通路