🤖 AI Summary
Large language models (LLMs) face significant challenges in lifelong knowledge updating due to the prohibitive cost of full retraining and difficulty in maintaining long-term factual consistency.
Method: We introduce WikiBigEdit—the first large-scale, real-world lifelong knowledge editing benchmark comprising 500K QA pairs—built via an automated, scalable framework that integrates Wikidata’s dynamic edit logs with automated QA generation. We systematically evaluate state-of-the-art knowledge editing methods (e.g., ROME, MEMIT), retrieval-augmented generation (RAG), and parameter-efficient continual fine-tuning under massive, sequential editing scenarios.
Contributions/Results: We identify critical failure modes: mainstream editing methods suffer severe performance degradation after >1K consecutive edits, exposing fundamental bottlenecks in scalability, edit persistence, and generalization. WikiBigEdit has become a community-standard benchmark, establishing a reproducible evaluation paradigm and clarifying concrete technical boundaries for industrial-grade lifelong knowledge maintenance.
📝 Abstract
Keeping large language models factually up-to-date is crucial for deployment, yet costly retraining remains a challenge. Knowledge editing offers a promising alternative, but methods are only tested on small-scale or synthetic edit benchmarks. In this work, we aim to bridge research into lifelong knowledge editing to real-world edits at practically relevant scale. We first introduce WikiBigEdit; a large-scale benchmark of real-world Wikidata edits, built to automatically extend lifelong for future-proof benchmarking. In its first instance, it includes over 500K question-answer pairs for knowledge editing alongside a comprehensive evaluation pipeline. Finally, we use WikiBigEdit to study existing knowledge editing techniques' ability to incorporate large volumes of real-world facts and contrast their capabilities to generic modification techniques such as retrieval augmentation and continual finetuning to acquire a complete picture of the practical extent of current lifelong knowledge editing.