🤖 AI Summary
Existing methods for generating complex tree structures—both static and dynamically growing—suffer from low inference efficiency and high memory consumption. Method: We propose a novel hourglass-shaped multi-resolution Transformer architecture with tailored training strategies. Specifically, we introduce the first hourglass-style multi-scale token compression mechanism incorporating long-range skip connections; design an autoregressive tree serialization encoding scheme coupled with differentiable 4D growth modeling; and achieve, for the first time, cross-modal conditional generation from images or point clouds to tree structures. Contribution/Results: Compared to standard Transformer baselines, our approach maintains high generation fidelity while significantly accelerating inference speed and reducing memory footprint. It enables efficient modeling of highly complex tree topologies and supports controllable, physics-informed growth simulation.
📝 Abstract
We propose a transformer architecture and training strategy for tree generation. The architecture processes data at multiple resolutions and has an hourglass shape, with middle layers processing fewer tokens than outer layers. Similar to convolutional networks, we introduce longer range skip connections to completent this multi-resolution approach. The key advantage of this architecture is the faster processing speed and lower memory consumption. We are therefore able to process more complex trees than would be possible with a vanilla transformer architecture. Furthermore, we extend this approach to perform image-to-tree and point-cloud-to-tree conditional generation and to simulate the tree growth processes, generating 4D trees. Empirical results validate our approach in terms of speed, memory consumption, and generation quality.