ShapeY: A Principled Framework for Measuring Shape Recognition Capacity via Nearest-Neighbor Matching

📅 2026-04-27

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Current deep visual models exhibit an overreliance on non-shape cues such as texture and background for object recognition, leading to poor shape invariance under variations in 3D viewpoint and appearance. To address this limitation, this work proposes a fine-grained embedding space evaluation paradigm centered on 3D shape similarity. The authors construct a benchmark comprising 68,200 multi-view grayscale renderings and employ multidimensional analyses—including nearest-neighbor matching, viewpoint tuning curves, and ordered matching grids—to systematically assess models’ capacity for shape-based clustering. Experiments across 321 pretrained models reveal a widespread deficiency in shape understanding, with most models failing to achieve robust cross-viewpoint shape recognition.

📝 Abstract

Object recognition (OR) in humans relies heavily on shape cues and the ability to recognize objects across varying 3D viewpoints. Unlike humans, deep networks often rely on non-shape cues such as texture and background, leading to vulnerabilities in generalization and robustness. To address this gap, we introduce ShapeY, a novel and principled benchmarking framework designed to evaluate shape-based recognition capability in OR systems. ShapeY comprises 68,200 grayscale images of 200 3D objects rendered from multiple viewpoints and optionally subjected to non-shape ``appearance'' changes. Using a nearest-neighbor matching task, ShapeY specifically probes the fine-grained structure of an OR system's embedding space by evaluating whether object views are clustered by 3D shape similarity across varying 3D viewpoints and other non-shape changes. ShapeY provides a suite of quantitative and qualitative performance readouts, including error rate graphs, viewpoint tuning curves, histograms of positive and negative matching scores, and grids showing ordered best matches, which together offer a comprehensive evaluation of an OR system's shape understanding capability. Testing of 321 pre-trained networks with diverse architectures reveals significant challenges in achieving robust shape-based recognition: even state-of-the-art models struggle to generalize consistently across 3D viewpoint and appearance changes, and are prone to infrequent but egregious matches of objects of obviously completely different shape. ShapeY establishes a principled framework for advancing artificial vision systems toward human-like shape recognition capabilities, emphasizing the importance of disentangled and invariant object encodings.

Problem

Research questions and friction points this paper is trying to address.

shape recognition

object recognition

3D viewpoint invariance

non-shape cues

generalization robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

shape recognition

nearest-neighbor matching

viewpoint invariance