🤖 AI Summary
To address the cold-start character decipherment challenge posed by numerous undeciphered oracle bone inscriptions (OBIs) from the Shang Dynasty, this paper proposes OBSD—the first generative decipherment framework based on conditional diffusion models. Unlike conventional NLP paradigms reliant on large-scale textual corpora, OBSD employs cross-modal glyph modeling to generate semantically relevant glyph cues in an “image-to-image” manner, enabling few-shot glyph analogy and semantic association reasoning. Experiments on a curated oracle bone inscription dataset demonstrate that OBSD significantly improves the accuracy of unknown-character conjecture. Moreover, the generated glyph cues are interpretable and provide novel, human-verifiable evidence to support expert decipherment. The project’s open-sourced code and decipherment results advance the paradigm of ancient script AI research—from discriminative to generative modeling.
📝 Abstract
Originating from China's Shang Dynasty approximately 3,000 years ago, the Oracle Bone Script (OBS) is a cornerstone in the annals of linguistic history, predating many established writing systems. Despite the discovery of thousands of inscriptions, a vast expanse of OBS remains undeciphered, casting a veil of mystery over this ancient language. The emergence of modern AI technologies presents a novel frontier for OBS decipherment, challenging traditional NLP methods that rely heavily on large textual corpora, a luxury not afforded by historical languages. This paper introduces a novel approach by adopting image generation techniques, specifically through the development of Oracle Bone Script Decipher (OBSD). Utilizing a conditional diffusion-based strategy, OBSD generates vital clues for decipherment, charting a new course for AI-assisted analysis of ancient languages. To validate its efficacy, extensive experiments were conducted on an oracle bone script dataset, with quantitative results demonstrating the effectiveness of OBSD. Code and decipherment results will be made available at https://github.com/guanhaisu/OBSD.