π€ AI Summary
Existing implicit multi-part object representations struggle with occlusion and out-of-distribution (OOD) scenarios, as they rely on indirect training objectives to implicitly model partβwhole relationships, resulting in insufficient robustness in part localization and identification. To address this, we propose an explicit graph-structured representation framework: (1) a differentiable graph construction module coupled with a self-supervised collaborative clustering algorithm for end-to-end part discovery and relational modeling; and (2) the first benchmark explicitly designed to evaluate occlusion- and OOD-robust multi-part object understanding. Our method integrates graph neural networks, multi-scale segmentation, and association modeling. It significantly improves part discovery quality across synthetic, real-world, and in-the-wild images, enables accurate part-level recognition under complex occlusion, and reduces downstream attribute prediction error by 32%.
π Abstract
Discovering object-centric representations from images can significantly enhance the robustness, sample efficiency and generalizability of vision models. Works on images with multi-part objects typically follow an implicit object representation approach, which fail to recognize these learned objects in occluded or out-of-distribution contexts. This is due to the assumption that object part-whole relations are implicitly encoded into the representations through indirect training objectives. We address this limitation by proposing a novel method that leverages on explicit graph representations for parts and present a co-part object discovery algorithm. We then introduce three benchmarks to evaluate the robustness of object-centric methods in recognizing multi-part objects within occluded and out-of-distribution settings. Experimental results on simulated, realistic, and real-world images show marked improvements in the quality of discovered objects compared to state-of-the-art methods, as well as the accurate recognition of multi-part objects in occluded and out-of-distribution contexts. We also show that the discovered object-centric representations can more accurately predict key object properties in a downstream task, highlighting the potential of our method to advance the field of object-centric representations.