Machine Learning Modeling for Multi-order Human Visual Motion Processing

📅 2025-01-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing action recognition models struggle to robustly perceive non-luminance-based motion cues—such as texture and specularity—due to their reliance on luminance-change signals and assumptions like brightness constancy. Method: We propose a dual-path neural network inspired by the primate V1–MT visual pathway, jointly modeling first-order (luminance-based) and second-order (non-luminance-based) motion. The model is trained via natural-video self-supervision to spontaneously acquire second-order motion sensitivity. Key components include trainable motion-energy sensor arrays, a 3D CNN-based nonlinear preprocessor, and a recurrent graph network—collectively bypassing the brightness constancy constraint inherent in optical flow methods. Results: Evaluated on a novel multi-material motion video dataset, our model replicates human psychophysical and neurophysiological motion perception properties. It significantly improves motion estimation robustness and accuracy under challenging non-Lambertian conditions—including specular highlights and complex textural variations—demonstrating superior generalization beyond conventional luminance-driven approaches.

Technology Category

Application Category

📝 Abstract
Our research aims to develop machines that learn to perceive visual motion as do humans. While recent advances in computer vision (CV) have enabled DNN-based models to accurately estimate optical flow in naturalistic images, a significant disparity remains between CV models and the biological visual system in both architecture and behavior. This disparity includes humans' ability to perceive the motion of higher-order image features (second-order motion), which many CV models fail to capture because of their reliance on the intensity conservation law. Our model architecture mimics the cortical V1-MT motion processing pathway, utilizing a trainable motion energy sensor bank and a recurrent graph network. Supervised learning employing diverse naturalistic videos allows the model to replicate psychophysical and physiological findings about first-order (luminance-based) motion perception. For second-order motion, inspired by neuroscientific findings, the model includes an additional sensing pathway with nonlinear preprocessing before motion energy sensing, implemented using a simple multilayer 3D CNN block. When exploring how the brain acquired the ability to perceive second-order motion in natural environments, in which pure second-order signals are rare, we hypothesized that second-order mechanisms were critical when estimating robust object motion amidst optical fluctuations, such as highlights on glossy surfaces. We trained our dual-pathway model on novel motion datasets with varying material properties of moving objects. We found that training to estimate object motion from non-Lambertian materials naturally endowed the model with the capacity to perceive second-order motion, as can humans. The resulting model effectively aligns with biological systems while generalizing to both first- and second-order motion phenomena in natural scenes.
Problem

Research questions and friction points this paper is trying to address.

Machine Learning
Human Action Recognition
Complex Scenes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Brain-inspired Visual Processing
Recurrent Neural Networks
3D Convolutional Preprocessing
🔎 Similar Papers
No similar papers found.
Z
Zitang Sun
Graduate School of Informatics, Kyoto University, Kyoto, 606-8501, Japan
Y
Yen-Ju Chen
Graduate School of Informatics, Kyoto University, Kyoto, 606-8501, Japan
Y
Yung-Hao Yang
Graduate School of Informatics, Kyoto University, Kyoto, 606-8501, Japan
Y
Yuan Li
Graduate School of Informatics, Kyoto University, Kyoto, 606-8501, Japan
Shin'ya Nishida
Shin'ya Nishida
Kyoto University, NTT
visionperception