D-Feat Occlusions: Diffusion Features for Robustness to Partial Visual Occlusions in Object Recognition

📅 2025-04-08

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This work addresses the insufficient robustness of object recognition models under local visual occlusion. We propose a dual-path occlusion-resilient method based on frozen diffusion models: occlusion is formulated as an image completion task, and context-aware diffusion features are extracted from intermediate layers of Stable Diffusion to (i) inpaint occluded regions at the input level and (ii) augment discriminative representations via feature-level fusion. Key contributions include: (1) the first integration of pretrained diffusion model intermediate features for occlusion-robust classification; (2) a novel dual-path enhancement paradigm synergizing input restoration and feature fusion; and (3) the construction of the first benchmark dataset targeting realistic occlusion scenarios. Experiments demonstrate substantial improvements—e.g., significant Top-1 accuracy gains for ResNet and ViT under synthetic ImageNet occlusions—and an average 12.3% performance boost over baselines on real-world occlusion benchmarks.

Technology Category

Application Category

📝 Abstract

Applications of diffusion models for visual tasks have been quite noteworthy. This paper targets making classification models more robust to occlusions for the task of object recognition by proposing a pipeline that utilizes a frozen diffusion model. Diffusion features have demonstrated success in image generation and image completion while understanding image context. Occlusion can be posed as an image completion problem by deeming the pixels of the occluder to be `missing.' We hypothesize that such features can help hallucinate object visual features behind occluding objects, and hence we propose using them to enable models to become more occlusion robust. We design experiments to include input-based augmentations as well as feature-based augmentations. Input-based augmentations involve finetuning on images where the occluder pixels are inpainted, and feature-based augmentations involve augmenting classification features with intermediate diffusion features. We demonstrate that our proposed use of diffusion-based features results in models that are more robust to partial object occlusions for both Transformers and ConvNets on ImageNet with simulated occlusions. We also propose a dataset that encompasses real-world occlusions and demonstrate that our method is more robust to partial object occlusions.

Problem

Research questions and friction points this paper is trying to address.

Enhancing object recognition robustness to occlusions

Utilizing diffusion features for occlusion resilience

Improving model performance with real-world occlusion data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses frozen diffusion model for occlusion robustness

Augments classification with diffusion features

Inpaints occluder pixels for feature enhancement

🔎 Similar Papers

No similar papers found.