🤖 AI Summary
While AI agents have made significant advances in communication, collaboration, and tool use, their ability to continuously acquire, process, and dynamically update data remains severely limited—undermining reliability in real-world deployment. To address this, we propose the *Data-Aware Agents* paradigm, formalizing and implementing four core capabilities: proactive data acquisition, complex data understanding and processing, interactive test-data generation, and continual self-adaptive evolution. Our approach integrates active learning, context-aware data processing, dynamic data synthesis, and continual learning mechanisms, enabling agents to autonomously optimize data quality and knowledge structure in open environments. This work establishes, for the first time, a systematic agent capability framework aligned with the full data lifecycle. Empirical evaluation demonstrates substantial improvements in robustness, adaptability, and deployment efficiency under dynamic conditions—providing foundational infrastructure for data-centric AI.
📝 Abstract
The recent surge in AI agents that autonomously communicate, collaborate with humans and use diverse tools has unlocked promising opportunities in various real-world settings. However, a vital aspect remains underexplored: how agents handle data. Scalable autonomy demands agents that continuously acquire, process, and evolve their data. In this paper, we argue that data-savvy capabilities should be a top priority in the design of agentic systems to ensure reliable real-world deployment. Specifically, we propose four key capabilities to realize this vision: (1) Proactive data acquisition: enabling agents to autonomously gather task-critical knowledge or solicit human input to address data gaps; (2) Sophisticated data processing: requiring context-aware and flexible handling of diverse data challenges and inputs; (3) Interactive test data synthesis: shifting from static benchmarks to dynamically generated interactive test data for agent evaluation; and (4) Continual adaptation: empowering agents to iteratively refine their data and background knowledge to adapt to shifting environments. While current agent research predominantly emphasizes reasoning, we hope to inspire a reflection on the role of data-savvy agents as the next frontier in data-centric AI.