SpikeGrasp: A Benchmark for 6-DoF Grasp Pose Detection from Stereo Spike Streams

📅 2025-10-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional robotic grasping relies on explicit 3D point cloud reconstruction, whereas biological vision achieves grasping without such intermediate representations. Method: We propose a neuro-inspired, end-to-end spiking-driven grasping framework that directly predicts 6-DoF grasp poses from asynchronous stereo event streams—bypassing point cloud reconstruction entirely. Our approach integrates stereo event data with a recurrent spiking neural network (SNN) to emulate retinal sensing and the dorsal visual pathway, enabling iterative refinement of grasp hypotheses. Contribution/Results: Evaluated on a large-scale synthetic dataset we curated, our method achieves superior data efficiency and robustness, outperforming state-of-the-art point-cloud-based approaches in textureless, cluttered, and dynamic scenes. It is the first to demonstrate the feasibility of high-accuracy, purely event-driven 6D grasping.

Technology Category

Application Category

📝 Abstract
Most robotic grasping systems rely on converting sensor data into explicit 3D point clouds, which is a computational step not found in biological intelligence. This paper explores a fundamentally different, neuro-inspired paradigm for 6-DoF grasp detection. We introduce SpikeGrasp, a framework that mimics the biological visuomotor pathway, processing raw, asynchronous events from stereo spike cameras, similarly to retinas, to directly infer grasp poses. Our model fuses these stereo spike streams and uses a recurrent spiking neural network, analogous to high-level visual processing, to iteratively refine grasp hypotheses without ever reconstructing a point cloud. To validate this approach, we built a large-scale synthetic benchmark dataset. Experiments show that SpikeGrasp surpasses traditional point-cloud-based baselines, especially in cluttered and textureless scenes, and demonstrates remarkable data efficiency. By establishing the viability of this end-to-end, neuro-inspired approach, SpikeGrasp paves the way for future systems capable of the fluid and efficient manipulation seen in nature, particularly for dynamic objects.
Problem

Research questions and friction points this paper is trying to address.

Detecting 6-DoF grasp poses from stereo spike streams without point clouds
Developing neuro-inspired vision for robotic grasping in cluttered scenes
Creating efficient spike-based grasping systems for dynamic object manipulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses stereo spike cameras mimicking biological retinas
Employs recurrent spiking neural network for grasp refinement
Directly infers grasp poses without point cloud reconstruction
🔎 Similar Papers
No similar papers found.
Z
Zhuoheng Gao
National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University
Jiyao Zhang
Jiyao Zhang
Peking University
Embodied AIRobotics3D Vision
Z
Zhiyong Xie
National Key Laboratory for Multimedia Information Processing, School of Computer Science, Peking University
H
Hao Dong
Center on Frontiers of Computing Studies, School of Computer Science, Peking University
Zhaofei Yu
Zhaofei Yu
Peking University
Brain-inspired ComputingSpiking Neural NetworksComputational Neuroscience
R
Rongmei Chen
School of Electronics, Peking University
Guozhang Chen
Guozhang Chen
Peking University
Computational NeuroscienceArtificial IntelligenceStatistical Physics
Tiejun Huang
Tiejun Huang
Professor,School of Computer Science, Peking University
Visual Information Processing