Towards flexible perception with visual memory

📅 2024-08-15

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

197K/year

🤖 AI Summary

To address the bottleneck wherein knowledge in deep neural networks is rigidly encoded in weights and thus difficult to edit dynamically, this paper proposes a visual memory framework that decouples image classification into pretrained embedding similarity matching and external memory bank nearest-neighbor retrieval. Methodologically, it introduces the first real-time knowledge insertion/deletion, reverse forgetting (unlearning), and memory pruning capabilities for deep vision models, establishing an intervention-aware and interpretable decision mechanism scalable from per-sample to billion-scale memory. Core technical innovations include a lightweight memory database architecture, efficient approximate nearest-neighbor (ANN) search, and differentiable memory update strategies. Evaluated across multi-scale benchmarks, the framework achieves high-accuracy classification while enabling decision attribution visualization and millisecond-level sample editing—significantly enhancing model controllability, adaptability, and maintainability.

Technology Category

Application Category

📝 Abstract

Training a neural network is a monolithic endeavor, akin to carving knowledge into stone: once the process is completed, editing the knowledge in a network is nearly impossible, since all information is distributed across the network's weights. We here explore a simple, compelling alternative by marrying the representational power of deep neural networks with the flexibility of a database. Decomposing the task of image classification into image similarity (from a pre-trained embedding) and search (via fast nearest neighbor retrieval from a knowledge database), we build a simple and flexible visual memory that has the following key capabilities: (1.) The ability to flexibly add data across scales: from individual samples all the way to entire classes and billion-scale data; (2.) The ability to remove data through unlearning and memory pruning; (3.) An interpretable decision-mechanism on which we can intervene to control its behavior. Taken together, these capabilities comprehensively demonstrate the benefits of an explicit visual memory. We hope that it might contribute to a conversation on how knowledge should be represented in deep vision models -- beyond carving it in"stone"weights.

Problem

Research questions and friction points this paper is trying to address.

Enabling flexible data editing in neural networks

Decomposing image classification into similarity and search

Providing interpretable and controllable decision mechanisms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines deep neural networks with database flexibility

Uses image similarity and fast nearest neighbor search

Enables flexible data addition and removal

🔎 Similar Papers

Mind the GAP: Glimpse-based Active Perception improves generalization and sample efficiency of visual reasoning