🤖 AI Summary
Current deep learning-based image quality assessment (IQA) models suffer from excessive parameter counts, poor interpretability, and susceptibility to overfitting. To address these limitations, we propose a biologically inspired parametric neural architecture that explicitly embeds human visual priors into convolutional layer design—shifting from purely data-driven regression toward perceptual mechanism modeling. Our key contributions are threefold: (1) We introduce the first parametric biological vision module, achieving state-of-the-art regression accuracy while reducing model parameters by 99.9% (three orders of magnitude); (2) We identify and mitigate intrinsic feature diffusion—a previously unaddressed pathology in deep IQA models; (3) We significantly enhance training stability and convergence speed, while endowing the model with explicit, neurophysiologically grounded response interpretability. Extensive experiments validate the efficacy of biologically informed parameter initialization, advancing IQA from black-box regression toward a perception-mechanism-driven paradigm.
📝 Abstract
Human vision models are at the core of image processing. For instance, classical approaches to the problem of image quality are based on models that include knowledge about human vision. However, nowadays, deep learning approaches have obtained competitive results by simply approaching this problem as regression of human decisions, and training an standard network on human-rated datasets. These approaches have the advantages of being easily adaptable to a particular problem and they fit very efficiently when data is available. However, mainly due to the excess of parameters, they have the problems of lack of interpretability, and over-fitting. Here we propose a vision model that combines the best of both worlds by using a parametric neural network architecture. We parameterize the layers to have bioplausible functionality, and provide a set of bioplausible parameters. We analyzed different versions of the model and compared it with the non-parametric version. The parametric models achieve a three orders of magnitude reduction in the number of parameters without suffering in regression performance. Furthermore, we show that the parametric models behave better during training and are easier to interpret as vision models. Interestingly, we find that, even initialized with bioplausible trained for regression using human rated datasets, which we call the feature-spreading problem. This suggests that the deep learning approach is inherently flawed, and emphasizes the need to evaluate and train models beyond regression.