🤖 AI Summary
Existing neural graph databases suffer from low training efficiency and limited expressiveness due to query-level batching and structure-specific embeddings. This work proposes an operator-level dynamic scheduling training framework that decouples logical operators from query topology, enabling efficient integration of high-dimensional semantic priors from pretrained text encoders through dynamic dataflow scheduling and multi-stream parallel computation. By circumventing I/O bottlenecks and memory overflow, the approach achieves 1.8–6.8× throughput gains across six benchmarks, substantially improves GPU utilization, and effectively alleviates representational friction in hybrid neuro-symbolic reasoning.
📝 Abstract
Neural Graph Databases (NGDBs) facilitate complex logical reasoning over incomplete knowledge structures, yet their training efficiency and expressivity are constrained by rigid query-level batching and structure-exclusive embeddings. We present NGDB-Zoo, a unified framework that resolves these bottlenecks by synergizing operator-level training with semantic augmentation. By decoupling logical operators from query topologies, NGDB-Zoo transforms the training loop into a dynamically scheduled data-flow execution, enabling multi-stream parallelism and achieving a $1.8\times$ - $6.8\times$ throughput compared to baselines. Furthermore, we formalize a decoupled architecture to integrate high-dimensional semantic priors from Pre-trained Text Encoders (PTEs) without triggering I/O stalls or memory overflows. Extensive evaluations on six benchmarks, including massive graphs like ogbl-wikikg2 and ATLAS-Wiki, demonstrate that NGDB-Zoo maintains high GPU utilization across diverse logical patterns and significantly mitigates representation friction in hybrid neuro-symbolic reasoning.