ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data

📅 2024-07-17

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

To address the challenge of evaluating instruction data efficacy in Document Visual Question Answering (DocVQA), this paper proposes a novel evaluation paradigm that models the *instruction execution process*—rather than textual content—as the primary object of assessment. Methodologically, we introduce: (1) the first process-labeling evaluation framework for fine-grained, quantitative instruction quality assessment; (2) DocLayPrompt, a layout-aware prompting strategy that explicitly models the coupling between document structure and task logic; and (3) an instruction-selective sampling and filtering mechanism. Experiments on synthetic document datasets demonstrate that our approach achieves 100% of full-data performance using only 30.5% of instructions—substantially outperforming existing methods based on text similarity or human annotation. The code is publicly available.

Technology Category

Application Category

📝 Abstract

Recently, large language models (LLMs) and multimodal large language models (MLLMs) have demonstrated promising results on document visual question answering (VQA) task, particularly after training on document instruction datasets. An effective evaluation method for document instruction data is crucial in constructing instruction data with high efficacy, which, in turn, facilitates the training of LLMs and MLLMs for document VQA. However, most existing evaluation methods for instruction data are limited to the textual content of the instructions themselves, thereby hindering the effective assessment of document instruction datasets and constraining their construction. In this paper, we propose ProcTag, a data-oriented method that assesses the efficacy of document instruction data. ProcTag innovatively performs tagging on the execution process of instructions rather than the instruction text itself. By leveraging the diversity and complexity of these tags to assess the efficacy of the given dataset, ProcTag enables selective sampling or filtering of document instructions. Furthermore, DocLayPrompt, a novel semi-structured layout-aware document prompting strategy, is proposed for effectively representing documents. Experiments demonstrate that sampling existing open-sourced and generated document VQA/instruction datasets with ProcTag significantly outperforms current methods for evaluating instruction data. Impressively, with ProcTag-based sampling in the generated document datasets, only 30.5% of the document instructions are required to achieve 100% efficacy compared to the complete dataset. The code is publicly available at https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/DocumentUnderstanding/ProcTag.

Problem

Research questions and friction points this paper is trying to address.

Evaluates efficacy of document instruction data for LLMs.

Introduces ProcTag for tagging instruction execution processes.

Proposes DocLayPrompt for semi-structured document representation.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Process tagging for document instruction efficacy

Semi-structured layout-aware document prompting

Selective sampling with ProcTag improves dataset efficiency

🔎 Similar Papers

Deep Learning based Key Information Extraction from Business Documents: Systematic Literature Review