Reanimating Images using Neural Representations of Dynamic Stimuli

📅 2024-06-04

📈 Citations: 1

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Current computer vision models remain substantially inferior to humans in dynamic motion understanding, especially in realistic, complex scenes. To address this, we propose a brain-inspired video understanding paradigm: (1) decoding fine-grained optical flow from dynamic visual stimuli directly from full-video fMRI responses, enabling the first end-to-end closed-loop reconstruction of high-fidelity videos from whole-brain activity; (2) leveraging a video diffusion model to disentangle static appearance representations from motion generation, and establishing a bidirectional enhancement mechanism between neural motion representations and artificial optical flow predictions; and (3) achieving object-level spatial resolution in motion signal decoding from brain activity. Experiments demonstrate substantial improvements in video-evoked fMRI response prediction accuracy and enable coherent, photorealistic video generation conditioned solely on the initial frame. Our framework provides an interpretable, generalizable neurocomputational foundation for cross-modal dynamic visual modeling.

Technology Category

Application Category

📝 Abstract

While computer vision models have made incredible strides in static image recognition, they still do not match human performance in tasks that require the understanding of complex, dynamic motion. This is notably true for real-world scenarios where embodied agents face complex and motion-rich environments. Our approach leverages state-of-the-art video diffusion models to decouple static image representation from motion generation, enabling us to utilize fMRI brain activity for a deeper understanding of human responses to dynamic visual stimuli. Conversely, we also demonstrate that information about the brain's representation of motion can enhance the prediction of optical flow in artificial systems. Our novel approach leads to four main findings: (1) Visual motion, represented as fine-grained, object-level resolution optical flow, can be decoded from brain activity generated by participants viewing video stimuli; (2) Video encoders outperform image-based models in predicting video-driven brain activity; (3) Brain-decoded motion signals enable realistic video reanimation based only on the initial frame of the video; and (4) We extend prior work to achieve full video decoding from video-driven brain activity. This framework advances our understanding of how the brain represents spatial and temporal information in dynamic visual scenes. Our findings demonstrate the potential of combining brain imaging with video diffusion models for developing more robust and biologically-inspired computer vision systems. We show additional decoding and encoding examples on this site: https://sites.google.com/view/neural-dynamics/home.

Problem

Research questions and friction points this paper is trying to address.

Decoding visual motion from brain activity using fMRI

Enhancing optical flow prediction with brain motion representation

Reanimating videos from static images via brain-decoded signals

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses fMRI brain activity for dynamic stimuli understanding

Leverages video diffusion models for motion generation

Decodes visual motion from brain activity for video reanimation

🔎 Similar Papers

Animate Your Thoughts: Decoupled Reconstruction of Dynamic Natural Vision from Slow Brain Activity

2024-05-06arXiv.orgCitations: 2

Apple

Cupertino, United States of America

AI Research Scientist, Computer Vision - Facebook Video Intelligence