Understanding Cross-Model Perceptual Invariances Through Ensemble Metamers

📅 2025-04-02

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

This study addresses the alignment between artificial neural network perceptual invariance and human vision. We propose the first meta-stimulus generation method integrating CNNs and Vision Transformers (ViTs), which extracts shared representational subspaces to systematically quantify semantic identifiability, naturalness, and cross-task transferability. Results show that CNN-generated meta-stimuli achieve higher semantic fidelity, superior human recognizability, and stronger generalization across tasks; in contrast, ViT-generated stimuli exhibit greater image naturalness but significantly weaker transfer performance. This work provides the first empirical evidence that architectural biases fundamentally constrain perceptual invariance—revealing a critical source of divergence between machine and human visual processing. Our findings establish a new paradigm for interpretable model analysis and human-machine vision alignment, offering principled insights into architecture-dependent representational priors. (132 words)

Technology Category

Application Category

📝 Abstract

Understanding the perceptual invariances of artificial neural networks is essential for improving explainability and aligning models with human vision. Metamers - stimuli that are physically distinct yet produce identical neural activations - serve as a valuable tool for investigating these invariances. We introduce a novel approach to metamer generation by leveraging ensembles of artificial neural networks, capturing shared representational subspaces across diverse architectures, including convolutional neural networks and vision transformers. To characterize the properties of the generated metamers, we employ a suite of image-based metrics that assess factors such as semantic fidelity and naturalness. Our findings show that convolutional neural networks generate more recognizable and human-like metamers, while vision transformers produce realistic but less transferable metamers, highlighting the impact of architectural biases on representational invariances.

Problem

Research questions and friction points this paper is trying to address.

Investigates perceptual invariances in neural networks using metamers

Compares metamer generation across CNN and vision transformer architectures

Assesses semantic fidelity and naturalness of generated metamers

Innovation

Methods, ideas, or system contributions that make the work stand out.

Ensemble neural networks generate shared metamers

Image metrics evaluate semantic fidelity and naturalness

Compare CNN and transformer perceptual invariances

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

Researcher, Interpretability

OpenAI

$295K – $445K • Offers Equity

San Francisco

AI Research Scientist, VLM (vision language models)