🤖 AI Summary
This paper addresses the problem of model-agnostic generalization to unseen categories in image classification—without retraining the base model. We propose a Memory-Modular Image Classifier that decouples knowledge storage (an external multimodal vision-language memory) from inference, enabling zero-shot, few-shot, fine-grained, and class-incremental classification solely through external memory replacement. Our method leverages joint vision-language representations, dynamic memory retrieval, and meta-learned noise-augmented data generation to construct a cacheable, swappable external knowledge module. To our knowledge, this is the first work to establish a pure memory-content-update-driven paradigm for cross-category generalization. Extensive experiments demonstrate significant performance gains over fine-tuning and prompt-engineering baselines across diverse classification tasks, while requiring zero parameter updates—ensuring strong adaptability and robustness across domains and deployment scenarios.
📝 Abstract
We propose a novel memory-modular learner for image classification that separates knowledge memorization from reasoning. Our model enables effective generalization to new classes by simply replacing the memory contents, without the need for model retraining. Unlike traditional models that encode both world knowledge and task-specific skills into their weights during training, our model stores knowledge in the external memory of web-crawled image and text data. At inference time, the model dynamically selects relevant content from the memory based on the input image, allowing it to adapt to arbitrary classes by simply replacing the memory contents. The key differentiator that our learner meta-learns to perform classification tasks with noisy web data from unseen classes, resulting in robust performance across various classification scenarios. Experimental results demonstrate the promising performance and versatility of our approach in handling diverse classification tasks, including zero-shot/few-shot classification of unseen classes, fine-grained classification, and class-incremental classification.