🤖 AI Summary
Existing AI agents struggle to achieve effective personalization by leveraging behavioral traces from local file systems, primarily due to privacy constraints, the absence of multimodal data, and overreliance on explicit user interactions. This work proposes FileGram, a novel framework that treats file operation behaviors as the core signal for personalization. FileGram comprises three key components: FileGramEngine, which generates realistic workflows through simulation; FileGramOS, an architecture integrating procedural, semantic, and episodic memory; and FileGramBench, the first multidimensional benchmark specifically designed for evaluating file-based behavioral understanding. Experiments demonstrate that FileGram significantly outperforms existing approaches in user profiling and behavioral comprehension, while FileGramBench presents a substantial challenge to current memory-augmented systems. The code is publicly released to foster further research.
📝 Abstract
Coworking AI agents operating within local file systems are rapidly emerging as a paradigm in human-AI interaction; however, effective personalization remains limited by severe data constraints, as strict privacy barriers and the difficulty of jointly collecting multimodal real-world traces prevent scalable training and evaluation, and existing methods remain interaction-centric while overlooking dense behavioral traces in file-system operations; to address this gap, we propose FileGram, a comprehensive framework that grounds agent memory and personalization in file-system behavioral traces, comprising three core components: (1) FileGramEngine, a scalable persona-driven data engine that simulates realistic workflows and generates fine-grained multimodal action sequences at scale; (2) FileGramBench, a diagnostic benchmark grounded in file-system behavioral traces for evaluating memory systems on profile reconstruction, trace disentanglement, persona drift detection, and multimodal grounding; and (3) FileGramOS, a bottom-up memory architecture that builds user profiles directly from atomic actions and content deltas rather than dialogue summaries, encoding these traces into procedural, semantic, and episodic channels with query-time abstraction; extensive experiments show that FileGramBench remains challenging for state-of-the-art memory systems and that FileGramEngine and FileGramOS are effective, and by open-sourcing the framework, we hope to support future research on personalized memory-centric file-system agents.