🤖 AI Summary
To address poor generalizability, schema adaptation difficulty, and high maintenance costs in cross-domain (e.g., scientific and news) unstructured text (web pages/PDFs) knowledge extraction, this paper proposes a schema-guided multi-agent knowledge extraction system. Methodologically, it introduces a large language model–based multi-role collaborative agent architecture, integrating schema-guided prompt engineering, knowledge-base–driven iterative optimization, and containerized deployment. It pioneers dynamic schema adaptation and a closed-loop debugging mechanism to enable end-to-end structured knowledge generation. Extensive evaluation on multiple benchmark datasets demonstrates significant improvements in generalizability and robustness over state-of-the-art baselines. The system is open-sourced with a functional demo, confirming its practical deployability and real-world applicability.
📝 Abstract
We introduce OneKE, a dockerized schema-guided knowledge extraction system, which can extract knowledge from the Web and raw PDF Books, and support various domains (science, news, etc.). Specifically, we design OneKE with multiple agents and a configure knowledge base. Different agents perform their respective roles, enabling support for various extraction scenarios. The configure knowledge base facilitates schema configuration, error case debugging and correction, further improving the performance. Empirical evaluations on benchmark datasets demonstrate OneKE's efficacy, while case studies further elucidate its adaptability to diverse tasks across multiple domains, highlighting its potential for broad applications. We have open-sourced the Code at https://github.com/zjunlp/OneKE and released a Video at http://oneke.openkg.cn/demo.mp4.