Context Sensitivity Improves Human-Machine Visual Alignment

📅 2026-04-15
📈 Citations: 0
Influential: 0
📄 PDF

career value

202K/year
🤖 AI Summary
This work addresses the limited contextual sensitivity of existing visual representation models, which hinders alignment between human and machine visual judgments. To bridge this gap, the authors propose a context-aware similarity computation method that introduces contextual mechanisms into visual embedding learning for the first time. Specifically, they formulate a triplet-based “odd-one-out” detection task, where the anchor image serves as a shared context to dynamically modulate object representations. Evaluated on both original and human-aligned foundation vision models, the approach significantly improves consistency with human perceptual judgments, achieving up to a 15% increase in odd-one-out detection accuracy.

Technology Category

Application Category

📝 Abstract
Modern machine learning models typically represent inputs as fixed points in a high-dimensional embedding space. While this approach has been proven powerful for a wide range of downstream tasks, it fundamentally differs from the way humans process information. Because humans are constantly adapting to their environment, they represent objects and their relationships in a highly context-sensitive manner. To address this gap, we propose a method for context-sensitive similarity computation from neural network embeddings, applied to modeling a triplet odd-one-out task with an anchor image serving as simultaneous context. Modeling context enables us to achieve up to a 15% improvement in odd-one-out accuracy over a context-insensitive model. We find that this improvement is consistent across both original and "human-aligned" vision foundation models.
Problem

Research questions and friction points this paper is trying to address.

context sensitivity
human-machine alignment
visual representation
embedding space
odd-one-out task
Innovation

Methods, ideas, or system contributions that make the work stand out.

context-sensitive similarity
human-machine alignment
odd-one-out task
vision foundation models
embedding adaptation