🤖 AI Summary
This paper challenges the conventional assumption that perceptual tasks rely primarily on high-variance principal components, revealing substantial input information redundancy across diverse vision and audio tasks—including image classification, semantic segmentation, optical flow estimation, depth prediction, and speech discrimination. Method: We propose a cross-domain orthogonal subspace decomposition framework—spanning pixel, Fourier, and wavelet bases—and a unified performance evaluation protocol to systematically assess task performance across full-spectrum frequency subspaces (low-, mid-, and high-variance). Contribution/Results: Empirically, we demonstrate that 80–95% of original task performance is retained even when using only arbitrary orthogonal subspaces—including the lowest-variance subspace containing less than 5% of total signal energy. This establishes the spectral universality of redundancy across modalities and tasks, refuting classical dimensionality-reduction paradigms. Crucially, we show task-relevant signals are broadly distributed across the entire frequency spectrum, providing a theoretical foundation for lightweight modeling and robust perception.
📝 Abstract
We show that many perception tasks, from visual recognition, semantic segmentation, optical flow, depth estimation to vocalization discrimination, are highly redundant functions of their input data. Images or spectrograms, projected into different subspaces, formed by orthogonal bases in pixel, Fourier or wavelet domains, can be used to solve these tasks remarkably well regardless of whether it is the top subspace where data varies the most, some intermediate subspace with moderate variability--or the bottom subspace where data varies the least. This phenomenon occurs because different subspaces have a large degree of redundant information relevant to the task.