Setting The Table with Intent: Intent-aware Schema Generation and Editing for Literature Review Tables

📅 2025-07-18

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

164K/year

🤖 AI Summary

The exponential growth of academic literature poses significant challenges for efficiently constructing comparative tables in survey papers. Existing schema generation methods suffer from ambiguous evaluation criteria and limited editability. To address these issues, this paper proposes an intent-aware schema generation and editing framework: (1) it introduces intent modeling to mitigate semantic ambiguity in comparative dimension identification; (2) it designs an editable generation pipeline enabling on-demand customization of comparison dimensions; (3) it constructs the first benchmark dataset tailored for conditional schema generation; and (4) it integrates LLM-based prompt engineering with lightweight fine-tuning, combining one-shot generation and multi-stage editing strategies. Experimental results demonstrate that intent enhancement substantially improves schema reconstruction accuracy, while the editing mechanism further refines output quality. Notably, our lightweight fine-tuned model achieves performance competitive with state-of-the-art prompting-based large language models.

Technology Category

Application Category

📝 Abstract

The increasing volume of academic literature makes it essential for researchers to organize, compare, and contrast collections of documents. Large language models (LLMs) can support this process by generating schemas defining shared aspects along which to compare papers. However, progress on schema generation has been slow due to: (i) ambiguity in reference-based evaluations, and (ii) lack of editing/refinement methods. Our work is the first to address both issues. First, we present an approach for augmenting unannotated table corpora with synthesized intents and apply it to create a dataset for studying schema generation conditioned on a given information need, thus reducing ambiguity. With this dataset, we show how incorporating table intents significantly improves baseline performance in reconstructing reference schemas. Next, we propose several LLM-based schema editing techniques. We start by comprehensively benchmarking several single-shot schema generation methods, including prompted LLM workflows and fine-tuned models, showing that smaller, open-weight models can be fine-tuned to be competitive with state-of-the-art prompted LLMs. Then we demonstrate that our editing techniques can further improve schemas generated by these methods.

Problem

Research questions and friction points this paper is trying to address.

Generating schemas to organize academic literature collections

Addressing ambiguity in schema evaluation through synthesized intents

Developing refinement methods to improve generated schemas

Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthesized intents augment unannotated table corpora

Fine-tuned open models compete with prompted LLMs

LLM-based refinement techniques improve generated schemas

🔎 Similar Papers

SciDaSynth: Interactive Structured Knowledge Extraction and Synthesis from Scientific Literature with Large Language Model