Understanding Cross-Model Perceptual Invariances Through Ensemble Metamers

📅 2025-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the alignment between artificial neural network perceptual invariance and human vision. We propose the first meta-stimulus generation method integrating CNNs and Vision Transformers (ViTs), which extracts shared representational subspaces to systematically quantify semantic identifiability, naturalness, and cross-task transferability. Results show that CNN-generated meta-stimuli achieve higher semantic fidelity, superior human recognizability, and stronger generalization across tasks; in contrast, ViT-generated stimuli exhibit greater image naturalness but significantly weaker transfer performance. This work provides the first empirical evidence that architectural biases fundamentally constrain perceptual invariance—revealing a critical source of divergence between machine and human visual processing. Our findings establish a new paradigm for interpretable model analysis and human-machine vision alignment, offering principled insights into architecture-dependent representational priors. (132 words)

Technology Category

Application Category

📝 Abstract
Understanding the perceptual invariances of artificial neural networks is essential for improving explainability and aligning models with human vision. Metamers - stimuli that are physically distinct yet produce identical neural activations - serve as a valuable tool for investigating these invariances. We introduce a novel approach to metamer generation by leveraging ensembles of artificial neural networks, capturing shared representational subspaces across diverse architectures, including convolutional neural networks and vision transformers. To characterize the properties of the generated metamers, we employ a suite of image-based metrics that assess factors such as semantic fidelity and naturalness. Our findings show that convolutional neural networks generate more recognizable and human-like metamers, while vision transformers produce realistic but less transferable metamers, highlighting the impact of architectural biases on representational invariances.
Problem

Research questions and friction points this paper is trying to address.

Investigates perceptual invariances in neural networks using metamers
Compares metamer generation across CNN and vision transformer architectures
Assesses semantic fidelity and naturalness of generated metamers
Innovation

Methods, ideas, or system contributions that make the work stand out.

Ensemble neural networks generate shared metamers
Image metrics evaluate semantic fidelity and naturalness
Compare CNN and transformer perceptual invariances
🔎 Similar Papers
No similar papers found.
L
Lukas Boehm
Machine Learning and Data Analytics Lab, Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander University, Erlangen, Germany
J
Jonas Leo Mueller
Machine Learning and Data Analytics Lab, Department Artificial Intelligence in Biomedical Engineering, Friedrich-Alexander University, Erlangen, Germany
Leo Schwinn
Leo Schwinn
Technical University of Munich
Machine LearningDeep LearningAdversarial Attacks
Bjoern M. Eskofier
Bjoern M. Eskofier
MaD Lab, FAU Erlangen-Nürnberg & TDH Group, Helmholtz Munich
Machine LearningArtificial IntelligenceWearable ComputingDigital HealthBiomedical Eng
Dario Zanca
Dario Zanca
Head of Applied Machine Learning Group @ MaD Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg
Deep LearningHuman-inspired AIRobustnessAI PsychophysicsComputer Vision