Beyond Isolated Dots: Benchmarking Structured Table Construction as Deep Knowledge Extraction

📅 2025-07-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current large language models (LLMs) struggle to accurately extract fragmented information from complex real-world documents (e.g., academic papers, reports) and dynamically construct structured tables, often producing disorganized, non-auditable paragraph-style outputs. Method: We introduce AOE—the first bilingual, dynamic schema generation benchmark for text-to-table conversion—spanning three domains and 11 tasks, requiring models to adaptively infer context-sensitive table schemas from inputs. AOE departs from conventional fixed-schema paradigms by incorporating multi-length inputs, traceable reasoning steps, and deep knowledge integration, supported by human-crafted diverse queries and gold-standard structured answers. Contribution/Results: Extensive evaluation reveals significant performance gaps across state-of-the-art open- and closed-source LLMs on AOE, exposing fundamental weaknesses in structured reasoning and information organization. These findings highlight critical bottlenecks and provide concrete directions for advancing robust, schema-agnostic table generation capabilities.

Technology Category

Application Category

📝 Abstract
With the emergence of large language models (LLMs), there is an expectation that LLMs can effectively extract explicit information from complex real-world documents (e.g., papers, reports). However, most LLMs generate paragraph-style answers that are chaotic, disorganized, and untraceable. To bridge this gap, we introduce the Arranged and Organized Extraction Benchmark (AOE), a new bilingual benchmark with data and documents of varying lengths designed to systematically evaluate the ability of LLMs to comprehend fragmented documents and reconstruct isolated information into one organized table. Unlike conventional text-to-table tasks, which rely on fixed schema and narrow task domains, AOE includes 11 carefully crafted tasks across three diverse domains, requiring models to generate context-specific schema tailored to varied input queries. In the experiment, we evaluated both open-source and closed-source state-of-the-art LLMs. The results show that even the most advanced models struggled significantly. The benchmark is available at https://huggingface.co/datasets/tianyumyum/AOE.
Problem

Research questions and friction points this paper is trying to address.

Evaluate LLMs' ability to extract and organize fragmented document information into structured tables
Assess performance of LLMs in generating context-specific schema for diverse queries
Benchmark LLMs' comprehension of complex documents across multiple domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bilingual benchmark for structured extraction
Context-specific schema generation
Evaluates LLMs on fragmented documents
T
Tianyun Zhong
Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences; University of Chinese Academy of Sciences
G
Guozhao Mo
Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences; University of Chinese Academy of Sciences
Yanjiang Liu
Yanjiang Liu
UCAS
Y
Yihan Chen
Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences; University of Chinese Academy of Sciences
L
Lingdi Kong
Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences; University of Chinese Academy of Sciences
Xuanang Chen
Xuanang Chen
Institute of Software, Chinese Academy of Sciences
Information RetrievalNatural Language Processing
Yaojie Lu
Yaojie Lu
Institute of Software, Chinese Academy of Sciences
Information ExtractionLarge Language Models
H
Hongyu Lin
Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences
Ben He
Ben He
Professor, University of Chinese Academy of Sciences
Natural Language ProcessingInformation Retrieval
Le Sun
Le Sun
Institute of Software, CAS
information_retrievalnatural_language_processing