DeFine: A Decomposed and Fine-Grained Annotated Dataset for Long-form Article Generation

📅 2025-03-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current long-form article generation (LFAG) suffers from logical inconsistency, incomplete topic coverage, and narrative incoherence—primarily due to the absence of hierarchically structured datasets with fine-grained annotations. To address this, we propose DeFine: the first hierarchical, multi-level fine-grained dataset for LFAG, featuring three-dimensional annotations—logical chain validity, topic coverage completeness, and narrative coherence. DeFine is constructed via a multi-agent collaborative pipeline integrating domain-knowledge injection, citation retrieval, question-answering–based annotation, and rigorous data cleaning. Leveraging DeFine, we fine-tune Qwen2-7B-Instruct and design three retrieval baselines: network-based, local, and anchor-based. Experiments demonstrate substantial improvements in topic coverage breadth, informational depth, and content fidelity. The DeFine dataset is publicly released, establishing a new benchmark and enabling standardized evaluation for LFAG research.

Technology Category

Application Category

📝 Abstract
Long-form article generation (LFAG) presents challenges such as maintaining logical consistency, comprehensive topic coverage, and narrative coherence across extended articles. Existing datasets often lack both the hierarchical structure and fine-grained annotation needed to effectively decompose tasks, resulting in shallow, disorganized article generation. To address these limitations, we introduce DeFine, a Decomposed and Fine-grained annotated dataset for long-form article generation. DeFine is characterized by its hierarchical decomposition strategy and the integration of domain-specific knowledge with multi-level annotations, ensuring granular control and enhanced depth in article generation. To construct the dataset, a multi-agent collaborative pipeline is proposed, which systematically segments the generation process into four parts: Data Miner, Cite Retreiver, Q&A Annotator and Data Cleaner. To validate the effectiveness of DeFine, we designed and tested three LFAG baselines: the web retrieval, the local retrieval, and the grounded reference. We fine-tuned the Qwen2-7b-Instruct model using the DeFine training dataset. The experimental results showed significant improvements in text quality, specifically in topic coverage, depth of information, and content fidelity. Our dataset publicly available to facilitate future research.
Problem

Research questions and friction points this paper is trying to address.

Addresses challenges in long-form article generation
Introduces DeFine dataset with hierarchical annotations
Improves text quality in topic coverage and depth
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical decomposition strategy for LFAG
Multi-agent collaborative pipeline construction
Fine-tuned Qwen2-7b-Instruct model validation
🔎 Similar Papers
No similar papers found.
M
Ming Wang
North China University of Technology
Fang Wang
Fang Wang
Postdoc, Stanford University
Reading acquisitiondyslexiacross-linguistic researchbilingualismcognitive neuroscience
M
Minghao Hu
Center of Information Research, AMS
L
Li He
North China University of Technology
H
Haiyang Wang
National University of Defense Technology
J
Jun Zhang
Center of Information Research, AMS
T
Tianwei Yan
National University of Defense Technology
L
Li Li
North China University of Technology
Zhunchen Luo
Zhunchen Luo
Unknown affiliation
W
Wei Luo
Center of Information Research, AMS
Xiaoying Bai
Xiaoying Bai
Tsinghua University
Software engineeringsoftware testingservice-oriented computingcloud computing
G
Guotong Geng
Center of Information Research, AMS