Don't Let AI Agents YOLO Your Files: Shifting Information and Control to Filesystems for Agent Safety and Autonomy

📅 2026-04-15

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

This work addresses the critical challenge of ensuring safety and autonomy in AI coding agents operating within user file systems, where unintended data corruption, accidental deletions, and information leaks are common. Existing approaches struggle to balance these competing demands. To tackle this issue, the paper introduces YoloFS, a novel file system architecture that embeds operational control and contextual awareness directly into the file system layer. YoloFS enables collaborative oversight between users and agents through three core mechanisms: Staging for safe change buffering, Snapshots for self-consistency verification, and Progressive Permission for incremental access granting. Empirical evaluation demonstrates that YoloFS autonomously corrects 8 out of 11 tasks involving hidden side effects and significantly reduces user intervention across 112 standard tasks while preserving baseline success rates. All operations remain fully auditable.

Technology Category

Application Category

📝 Abstract

AI coding agents operate directly on users' filesystems, where they regularly corrupt data, delete files, and leak secrets. Current approaches force a tradeoff between safety and autonomy: unrestricted access risks harm, while frequent permission prompts burden users and block agents. To understand this problem, we conduct the first systematic study of agent filesystem misuse, analyzing 290 public reports across 13 frameworks. Our analysis reveals that today's agents have limited information about their filesystem effects and insufficient control over them. We therefore argue for shifting this information and control to the filesystem itself. Based on this principle, we design YoloFS, an agent-native filesystem with three techniques. Staging isolates all mutations before commit, giving users corrective control. Snapshots extend this control to agents, letting them detect and correct their own mistakes. Progressive permission provides users with preventive control by gating access with minimal interaction. To evaluate YoloFS, we introduce a new methodology that captures user-agent-filesystem interactions. On 11 tasks with hidden side effects, YoloFS enables agent self-correction in 8 while keeping all effects staged and reviewable. On 112 routine tasks, YoloFS requires fewer user interactions while matching the baseline success rate.

Problem

Research questions and friction points this paper is trying to address.

AI agents

filesystem safety

autonomy

data corruption

permission control

Innovation

Methods, ideas, or system contributions that make the work stand out.

YoloFS

staging

snapshots