OracleAgent: A Multimodal Reasoning Agent for Oracle Bone Script Research

📅 2025-10-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Oracle bone script research faces two major bottlenecks: highly complex, multi-stage workflows (involving both sequential and parallel subtasks) and inefficient resource retrieval. To address these challenges, this paper introduces the first intelligent agent system specifically designed for oracle bone script analysis. The system integrates large language models, multimodal reasoning, image retrieval, and knowledge graph technologies to construct a domain-specific multimodal knowledge base encompassing over one million samples. We propose a modular agent architecture capable of dynamically orchestrating specialized tools—including character recognition, glyph matching, and textual interpretation—to enable cross-modal semantic alignment and end-to-end task coordination. Experimental results demonstrate that our system outperforms state-of-the-art multimodal foundation models (e.g., GPT-4o) on multiple oracle bone script reasoning benchmarks. Moreover, it significantly reduces expert research cycles, with empirical validation confirming its efficacy and practical utility in real-world scholarly workflows.

Technology Category

Application Category

📝 Abstract
As one of the earliest writing systems, Oracle Bone Script (OBS) preserves the cultural and intellectual heritage of ancient civilizations. However, current OBS research faces two major challenges: (1) the interpretation of OBS involves a complex workflow comprising multiple serial and parallel sub-tasks, and (2) the efficiency of OBS information organization and retrieval remains a critical bottleneck, as scholars often spend substantial effort searching for, compiling, and managing relevant resources. To address these challenges, we present OracleAgent, the first agent system designed for the structured management and retrieval of OBS-related information. OracleAgent seamlessly integrates multiple OBS analysis tools, empowered by large language models (LLMs), and can flexibly orchestrate these components. Additionally, we construct a comprehensive domain-specific multimodal knowledge base for OBS, which is built through a rigorous multi-year process of data collection, cleaning, and expert annotation. The knowledge base comprises over 1.4M single-character rubbing images and 80K interpretation texts. OracleAgent leverages this resource through its multimodal tools to assist experts in retrieval tasks of character, document, interpretation text, and rubbing image. Extensive experiments demonstrate that OracleAgent achieves superior performance across a range of multimodal reasoning and generation tasks, surpassing leading mainstream multimodal large language models (MLLMs) (e.g., GPT-4o). Furthermore, our case study illustrates that OracleAgent can effectively assist domain experts, significantly reducing the time cost of OBS research. These results highlight OracleAgent as a significant step toward the practical deployment of OBS-assisted research and automated interpretation systems.
Problem

Research questions and friction points this paper is trying to address.

Addresses complex multimodal interpretation workflow for Oracle Bone Script
Improves information organization and retrieval efficiency for scholars
Integrates specialized tools with large language models for analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates multiple analysis tools with large language models
Constructs a multimodal knowledge base with expert annotations
Leverages multimodal tools for retrieval and reasoning tasks
🔎 Similar Papers
No similar papers found.
Caoshuo Li
Caoshuo Li
Xiamen University
Diffusion ModelLarge Vision-Language ModelVision Backbone
Z
Zengmao Ding
Anyang Normal University
Xiaobin Hu
Xiaobin Hu
Tencent Youtu Lab;Technische Universität München (TUM)
Deep learningComputer visionVLMAgents
B
Bang Li
Anyang Normal University
Donghao Luo
Donghao Luo
Youtu lab@Tencent, Shanghai Jiao Tong University
cvdeep learning
Xu Peng
Xu Peng
Associate Reseacher in Harbin Institute of Technology
Legged Robot
Taisong Jin
Taisong Jin
Assistant Professor of Computer, Xiamen University
Graph Neural Network
Y
Yongge Liu
Anyang Normal University
S
Shengwei Han
Anyang Normal University
J
Jing Yang
Anyang Normal University
X
Xiaoping He
Anyang Normal University
F
Feng Gao
Anyang Normal University
A
AndyPian Wu
Tencent SSV
S
SevenShu
Tencent SSV
C
Chaoyang Wang
Tencent SSV
C
Chengjie Wang
Tencent YouTu Lab