🤖 AI Summary
Existing action recognition models struggle to robustly perceive non-luminance-based motion cues—such as texture and specularity—due to their reliance on luminance-change signals and assumptions like brightness constancy.
Method: We propose a dual-path neural network inspired by the primate V1–MT visual pathway, jointly modeling first-order (luminance-based) and second-order (non-luminance-based) motion. The model is trained via natural-video self-supervision to spontaneously acquire second-order motion sensitivity. Key components include trainable motion-energy sensor arrays, a 3D CNN-based nonlinear preprocessor, and a recurrent graph network—collectively bypassing the brightness constancy constraint inherent in optical flow methods.
Results: Evaluated on a novel multi-material motion video dataset, our model replicates human psychophysical and neurophysiological motion perception properties. It significantly improves motion estimation robustness and accuracy under challenging non-Lambertian conditions—including specular highlights and complex textural variations—demonstrating superior generalization beyond conventional luminance-driven approaches.
📝 Abstract
Our research aims to develop machines that learn to perceive visual motion as do humans. While recent advances in computer vision (CV) have enabled DNN-based models to accurately estimate optical flow in naturalistic images, a significant disparity remains between CV models and the biological visual system in both architecture and behavior. This disparity includes humans' ability to perceive the motion of higher-order image features (second-order motion), which many CV models fail to capture because of their reliance on the intensity conservation law. Our model architecture mimics the cortical V1-MT motion processing pathway, utilizing a trainable motion energy sensor bank and a recurrent graph network. Supervised learning employing diverse naturalistic videos allows the model to replicate psychophysical and physiological findings about first-order (luminance-based) motion perception. For second-order motion, inspired by neuroscientific findings, the model includes an additional sensing pathway with nonlinear preprocessing before motion energy sensing, implemented using a simple multilayer 3D CNN block. When exploring how the brain acquired the ability to perceive second-order motion in natural environments, in which pure second-order signals are rare, we hypothesized that second-order mechanisms were critical when estimating robust object motion amidst optical fluctuations, such as highlights on glossy surfaces. We trained our dual-pathway model on novel motion datasets with varying material properties of moving objects. We found that training to estimate object motion from non-Lambertian materials naturally endowed the model with the capacity to perceive second-order motion, as can humans. The resulting model effectively aligns with biological systems while generalizing to both first- and second-order motion phenomena in natural scenes.