🤖 AI Summary
Text-to-structured generation (e.g., tables, knowledge graphs, charts) for agent-centric AI is a foundational infrastructure enabling context-aware retrieval and autonomous reasoning, yet suffers from fragmented methodologies, scarce standardized datasets, and inconsistent evaluation protocols. Method: We conduct a systematic literature review integrating techniques from NLP, information extraction, knowledge representation, and machine learning to establish the first holistic analytical framework—comprising task taxonomy, benchmark dataset inventory, and unified evaluation metrics. Contribution/Results: We introduce the first general-purpose evaluation framework for structured output generation, explicitly identifying methodological limitations and core challenges (e.g., fidelity, composability, and reasoning-aware assessment). We comprehensively map research gaps and affirm the centrality of this direction in next-generation AI systems, providing both theoretical grounding and practical guidance for future algorithmic development and empirical validation.
📝 Abstract
The evolution of AI systems toward agentic operation and context-aware retrieval necessitates transforming unstructured text into structured formats like tables, knowledge graphs, and charts. While such conversions enable critical applications from summarization to data mining, current research lacks a comprehensive synthesis of methodologies, datasets, and metrics. This systematic review examines text-to-structure techniques and the encountered challenges, evaluates current datasets and assessment criteria, and outlines potential directions for future research. We also introduce a universal evaluation framework for structured outputs, establishing text-to-structure as foundational infrastructure for next-generation AI systems.