đ€ AI Summary
Penman linearizationâthe de facto standard for AMR parsingâsuffers from two fundamental limitations: (i) distortion of node locality in deep graphs and (ii) combinatorial explosion of relation types due to inverse roles. Method: We propose a novel, triple-based graph linearization paradigm that explicitly encodes AMR graphs as sequences of (subject, predicate, object) triples, eliminating redundancy from reentrancies and enhancing structural fidelity. Contribution/Results: This work is the first to systematically characterize Penmanâs dual deficiencies in preserving node proximity and handling relational complexity, establishing a rigorous, comparable evaluation framework. Empirical results show that triple-based encoding significantly improves graph-structure representation accuracyâespecially for long-range dependenciesâoutperforming Penman linearization. However, it remains slightly less compact than Penman for highly nested structures. Our findings provide both theoretical insights and empirically grounded design principles for graph-to-sequence linearization.
đ Abstract
Sequence-to-sequence models are widely used to train Abstract Meaning Representation (Banarescu et al., 2013, AMR) parsers. To train such models, AMR graphs have to be linearized into a one-line text format. While Penman encoding is typically used for this purpose, we argue that it has limitations: (1) for deep graphs, some closely related nodes are located far apart in the linearized text (2) Penman's tree-based encoding necessitates inverse roles to handle node re-entrancy, doubling the number of relation types to predict. To address these issues, we propose a triple-based linearization method and compare its efficiency with Penman linearization. Although triples are well suited to represent a graph, our results suggest room for improvement in triple encoding to better compete with Penman's concise and explicit representation of a nested graph structure.