🤖 AI Summary
This work addresses the problem of controllable generation of tree-structured data: generating all valid ordered, rooted, labeled trees within a specified tree edit distance from a given input tree. We propose a feedforward neural network framework employing ReLU activation functions, integrated with tree-traversal-based encoding and symbolic constraint-driven decoding to enable precise modeling of discrete tree structures. Theoretically, we prove that a ReLU network of size *O*(*n*³) and constant depth suffices to provably cover the entire set of trees satisfying the distance constraint—establishing the first deterministic and formally verifiable tree generation scheme. Empirically, our method achieves near 100% validation accuracy on trees with up to 21 nodes, substantially outperforming non-deterministic baselines such as GraphRNN (48%) and GraphGDP.
📝 Abstract
The generation of trees with a specified tree edit distance has significant applications across various fields, including computational biology, structured data analysis, and image processing. Recently, generative networks have been increasingly employed to synthesize new data that closely resembles the original datasets. However, the appropriate size and depth of generative networks required to generate data with a specified tree edit distance remain unclear. In this paper, we theoretically establish the existence and construction of generative networks capable of producing trees similar to a given tree with respect to the tree edit distance. Specifically, for a given rooted, ordered, and vertex-labeled tree T of size n + 1 with labels from an alphabet Σ, and a non-negative integer d, we prove that all rooted, ordered, and vertex-labeled trees over Σwith tree edit distance at most d from T can be generated using a ReLU-based generative network with size O(n^3 ) and constant depth. The proposed networks were implemented and evaluated for generating trees with up to 21 nodes. Due to their deterministic architecture, the networks successfully generated all valid trees within the specified tree edit distance. In contrast, state-of-the-art graph generative models GraphRNN and GraphGDP, which rely on non-deterministic mechanisms, produced significantly fewer valid trees, achieving validation rates of only up to 35% and 48%, respectively. These findings provide a theoretical foundation towards construction of compact generative models and open new directions for exact and valid tree-structured data generation. An implementation of the proposed networks is available at https://github.com/MGANN-KU/TreeGen_ReLUNetworks.