What Cohort INRs Encode and Where to Freeze Them

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This work investigates the origin and semantics of transferable representations in implicitly trained neural radiance fields (INRs) across a population setting, where such properties remain poorly understood. For the first time, sparse autoencoders (SAEs) are applied to analyze INR activations, enabling a systematic comparison between SIREN and Fourier Feature MLPs (FFMLPs) by freezing shared encoder layers at varying depths during inference. The study reveals that the optimal freezing depth aligns with the layer exhibiting the highest weight stable rank, achieving performance on par with or surpassing full fine-tuning. Moreover, SIREN learns spatially localized atomic features, whereas FFMLP atoms encode global structural contours—ablating a single FFMLP atom can degrade reconstruction quality by up to 10.6 dB in PSNR. These findings uncover fundamental differences in representational structure between the two INR architectures and propose a stable-rank-guided strategy for efficient transfer.

📝 Abstract

Reusing the early layers of cohort-trained INRs as initialization for new signals has been shown to accelerate and improve signal fitting, yet it remains unclear which layers of the shared encoder learn transferable representations and what those representations encode. We address both questions for two standard backbones, SIREN and Fourier-feature MLPs (FFMLP). First, sweeping the freeze depth across the shared encoder at test time, we find that the optimum coincides with the layer of highest weight stable rank. Moreover, freezing at this depth matches or improves on the standard fine-tuning recipe across all our experiments. Second, identifying which layer transfers does not characterize what that layer encodes. To address this we adopt sparse autoencoders (SAEs), the dominant tool in mechanistic interpretability, and present the first SAE decomposition of INR activations into sparse dictionary atoms. Interestingly, SIREN and FFMLP achieve comparable cohort-fitting quality, but learn qualitatively different dictionaries. Cohort SIREN's atoms are localized, tiling the coordinate plane such that each atom fires in a confined region independent of cohort content. Cohort FFMLP's atoms are image-spanning, tracing the contours of memorized cohort signals. Single-atom ablations confirm causal use of these dictionaries: a single FFMLP atom out of 4096 can drop PSNR by up to 10.6 dB across the image, while SIREN ablations remain confined to where the atom fires. Together, these results give the first mechanistic account of what transfers in cohort-trained INRs and turn their activations into inspectable dictionary atoms. These tools open a path towards characterizing what INRs encode and towards architectures designed for generalization rather than memorization.

Problem

Research questions and friction points this paper is trying to address.

implicit neural representations

transferability

mechanistic interpretability

cohort INRs

representation encoding

Innovation

Methods, ideas, or system contributions that make the work stand out.

Implicit Neural Representations

Sparse Autoencoders

Mechanistic Interpretability