End-to-End Implicit Neural Representations for Classification

📅 2025-03-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the significant performance gap of implicit neural representations (INRs) relative to pixel-based models (e.g., CNNs) in image classification. We propose the first end-to-end trainable SIREN-based classification framework. Methodologically, we introduce a SIREN-specific meta-learning initialization strategy and an adaptive learning rate scheduler, abandoning hand-crafted symmetry modeling to directly optimize the discriminative capability of implicit representations; the classifier head employs a lightweight Transformer architecture. Our contributions are threefold: (1) the first end-to-end SIREN classifier on ImageNet-1K, achieving 23.6% top-1 accuracy; (2) establishment of the first high-resolution INR classification benchmark; and (3) state-of-the-art results on CIFAR-10 (59.6%/64.7%), Imagenette (60.8%), and other benchmarks—demonstrating both the efficacy and scalability of INRs for direct discriminative tasks.

Technology Category

Application Category

📝 Abstract
Implicit neural representations (INRs) such as NeRF and SIREN encode a signal in neural network parameters and show excellent results for signal reconstruction. Using INRs for downstream tasks, such as classification, is however not straightforward. Inherent symmetries in the parameters pose challenges and current works primarily focus on designing architectures that are equivariant to these symmetries. However, INR-based classification still significantly under-performs compared to pixel-based methods like CNNs. This work presents an end-to-end strategy for initializing SIRENs together with a learned learning-rate scheme, to yield representations that improve classification accuracy. We show that a simple, straightforward, Transformer model applied to a meta-learned SIREN, without incorporating explicit symmetry equivariances, outperforms the current state-of-the-art. On the CIFAR-10 SIREN classification task, we improve the state-of-the-art without augmentations from 38.8% to 59.6%, and from 63.4% to 64.7% with augmentations. We demonstrate scalability on the high-resolution Imagenette dataset achieving reasonable reconstruction quality with a classification accuracy of 60.8% and are the first to do INR classification on the full ImageNet-1K dataset where we achieve a SIREN classification performance of 23.6%. To the best of our knowledge, no other SIREN classification approach has managed to set a classification baseline for any high-resolution image dataset. Our code is available at https://github.com/SanderGielisse/MWT
Problem

Research questions and friction points this paper is trying to address.

Improving classification accuracy using implicit neural representations (INRs).
Overcoming inherent symmetries in INR parameters for better performance.
Achieving scalable INR-based classification on high-resolution image datasets.
Innovation

Methods, ideas, or system contributions that make the work stand out.

End-to-end SIREN initialization with learned learning-rate
Transformer model on meta-learned SIREN without equivariances
Scalable INR classification for high-resolution ImageNet
🔎 Similar Papers
No similar papers found.