I Walk the Line: Examining the Role of Gestalt Continuity in Object Binding for Vision Transformers

📅 2026-04-10

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This study investigates how Vision Transformers (ViTs) achieve object binding, specifically examining whether they rely on the Gestalt principle of continuity rather than solely on similarity or proximity. By constructing synthetic image datasets and employing attention head probing, cross-dataset generalization tests, and ablation studies, the work provides the first systematic evidence that certain attention heads in ViTs explicitly encode continuity and causally contribute to object binding. The findings reveal that most pretrained ViTs are highly sensitive to continuity, with these critical attention heads demonstrating strong generalization capabilities and significantly enhancing the quality of binding representations.

Technology Category

Application Category

📝 Abstract

Object binding is a foundational process in visual cognition, during which low-level perceptual features are joined into object representations. Binding has been considered a fundamental challenge for neural networks, and a major milestone on the way to artificial models with flexible visual intelligence. Recently, several investigations have demonstrated evidence that binding mechanisms emerge in pretrained vision models, enabling them to associate portions of an image that contain an object. The question remains: how are these models binding objects together? In this work, we investigate whether vision models rely on the principle of Gestalt continuity to perform object binding, over and above other principles like similarity and proximity. Using synthetic datasets, we demonstrate that binding probes are sensitive to continuity across a wide range of pretrained vision transformers. Next, we uncover particular attention heads that track continuity, and show that these heads generalize across datasets. Finally, we ablate these attention heads, and show that they often contribute to producing representations that encode object binding.

Problem

Research questions and friction points this paper is trying to address.

object binding

Gestalt continuity

vision transformers

visual cognition

perceptual organization

Innovation

Methods, ideas, or system contributions that make the work stand out.

object binding

Gestalt continuity

vision transformers