Extracting Abstraction Dimensions by Identifying Syntax Pattern from Texts

πŸ“… 2025-04-26
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This paper addresses the challenge of structured understanding and efficient querying of natural language text. We propose a novel four-dimensional syntactic abstraction model that automatically identifies four semantic dimensions: subject (topic), predicate (action), object (entity), and adverbial (modifier). To our knowledge, this is the first work to systematically define and implement end-to-end extraction of such a four-dimensional structure. Our method introduces a hierarchy-aware abstraction tree construction paradigm grounded in subclass relationships, ensuring comprehensiveness, non-redundancy, and query efficiency. Furthermore, we design a multi-tree joint retrieval mechanism enabling dimension-level natural language queries. Experimental results demonstrate that the four-dimensional abstraction trees achieve precision, recall, and F1 scores exceeding 80%; support high query coverage; significantly reduce search space; and enable rapid, accurate sentence-level localization.

Technology Category

Application Category

πŸ“ Abstract
This paper proposed an approach to automatically discovering subject dimension, action dimension, object dimension and adverbial dimension from texts to efficiently operate texts and support query in natural language. The high quality of trees guarantees that all subjects, actions, objects and adverbials and their subclass relations within texts can be represented. The independency of trees ensures that there is no redundant representation between trees. The expressiveness of trees ensures that the majority of sentences can be accessed from each tree and the rest of sentences can be accessed from at least one tree so that the tree-based search mechanism can support querying in natural language. Experiments show that the average precision, recall and F1-score of the abstraction trees constructed by the subclass relations of subject, action, object and adverbial are all greater than 80%. The application of the proposed approach to supporting query in natural language demonstrates that different types of question patterns for querying subject or object have high coverage of texts, and searching multiple trees on subject, action, object and adverbial according to the question pattern can quickly reduce search space to locate target sentences, which can support precise operation on texts.
Problem

Research questions and friction points this paper is trying to address.

Automatically discovering text dimensions for efficient querying
Ensuring high-quality tree representation of text elements
Supporting natural language queries with precise search mechanisms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatically discovers text dimensions via syntax patterns
Uses high-quality trees for comprehensive text representation
Supports natural language queries with tree-based search
πŸ”Ž Similar Papers
No similar papers found.
J
Jian Zhou
Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
J
Jiazheng Li
Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
S
Sirui Zhuge
King’s College London, UK; Publicis Sapient, UK
Hai Zhuge
Hai Zhuge
Professor of Computing, Chinese Academy of Sciences
Cyber-Physical SocietyArtificial IntelligenceDatabaseWeb SemanticsKnowledge Grid