Downstream Task Guided Masking Learning in Masked Autoencoders Using Multi-Level Optimization

📅 2024-02-28

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Existing Masked Autoencoders (MAEs) employ random masking, disregarding inter-patch information content variability and downstream task requirements, thereby limiting representation discriminability and generalization. To address this, we propose an end-to-end differentiable, downstream-aware mask learning framework that, for the first time, backpropagates downstream task gradients into the MAE pretraining masking selection process. Our method jointly optimizes task-oriented dynamic masking policies across multiple levels, enabling gradient-driven mask scheduling without requiring additional annotations. It supports plug-and-play integration of arbitrary downstream task feedback signals. Extensive experiments demonstrate consistent and significant improvements over MAE and other baselines across diverse vision benchmarks—including image classification, object detection, and semantic segmentation—validating both the effectiveness and generality of task-driven masking for self-supervised representation learning.

Technology Category

Application Category

📝 Abstract

Masked Autoencoder (MAE) is a notable method for self-supervised pretraining in visual representation learning. It operates by randomly masking image patches and reconstructing these masked patches using the unmasked ones. A key limitation of MAE lies in its disregard for the varying informativeness of different patches, as it uniformly selects patches to mask. To overcome this, some approaches propose masking based on patch informativeness. However, these methods often do not consider the specific requirements of downstream tasks, potentially leading to suboptimal representations for these tasks. In response, we introduce the Multi-level Optimized Mask Autoencoder (MLO-MAE), a novel framework that leverages end-to-end feedback from downstream tasks to learn an optimal masking strategy during pretraining. Our experimental findings highlight MLO-MAE's significant advancements in visual representation learning. Compared to existing methods, it demonstrates remarkable improvements across diverse datasets and tasks, showcasing its adaptability and efficiency.

Problem

Research questions and friction points this paper is trying to address.

Optimizes masking strategy for downstream tasks in MAE

Addresses uniform patch masking limitation in self-supervised learning

Enhances visual representation learning via task-guided multi-level optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-level optimization for masking strategy

Downstream task guided masking learning

End-to-end feedback for optimal pretraining

🔎 Similar Papers

No similar papers found.