Human-Like Coarse Object Representations in Vision Models

📅 2026-02-12

📈 Citations: 0

✨ Influential: 0

📄 PDF

career value

233K/year

Technology Category

Application Category

📝 Abstract

Humans appear to represent objects for intuitive physics with coarse, volumetric bodies''that smooth concavities - trading fine visual details for efficient physical predictions - yet their internal structure is largely unknown. Segmentation models, in contrast, optimize pixel-accurate masks that may misalign with such bodies. We ask whether and when these models nonetheless acquire human-like bodies. Using a time-to-collision (TTC) behavioral paradigm, we introduce a comparison pipeline and alignment metric, then vary model training time, size, and effective capacity via pruning. Across all manipulations, alignment with human behavior follows an inverse U-shaped curve: small/briefly trained/pruned models under-segment into blobs; large/fully trained models over-segment with boundary wiggles; and an intermediate ideal body granularity''best matches humans. This suggests human-like coarse bodies emerge from resource constraints rather than bespoke biases, and points to simple knobs - early checkpoints, modest architectures, light pruning - for eliciting physics-efficient representations. We situate these results within resource-rational accounts balancing recognition detail against physical affordances.

Problem

Research questions and friction points this paper is trying to address.

coarse object representations

intuitive physics

human-like representations

segmentation models

resource constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

coarse object representations

intuitive physics

resource-rationality