Training-free Spatially Grounded Geometric Shape Encoding (Technical Report)

📅 2026-04-08

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This work proposes XShapeEnc, a training-free, general-purpose 2D shape encoding method that overcomes the limitations of conventional 1D positional encodings. By normalizing arbitrary 2D geometric shapes onto the unit disk, XShapeEnc represents geometry using orthogonal Zernike bases—either jointly or independently—and captures pose through harmonic orientation fields, while incorporating a frequency propagation mechanism to enhance high-frequency details. The method exhibits five desirable properties: invertibility, adaptability, spectral richness, robustness, and computational efficiency. Extensive experiments across diverse shape-aware tasks and on the newly introduced XShapeCorpus dataset demonstrate its theoretical soundness, computational efficiency, strong discriminative power, and broad applicability.

Technology Category

Application Category

📝 Abstract

Positional encoding has become the de facto standard for grounding deep neural networks on discrete point-wise positions, and it has achieved remarkable success in tasks where the input can be represented as a one-dimensional sequence. However, extending this concept to 2D spatial geometric shapes demands carefully designed encoding strategies that account not only for shape geometry and pose, but also for compatibility with neural network learning. In this work, we address these challenges by introducing a training-free, general-purpose encoding strategy, dubbed XShapeEnc, that encodes an arbitrary spatially grounded 2D geometric shape into a compact representation exhibiting five favorable properties, including invertibility, adaptivity, and frequency richness. Specifically, a 2D spatially grounded geometric shape is decomposed into its normalized geometry within the unit disk and its pose vector, where the pose is further transformed into a harmonic pose field that also lies within the unit disk. A set of orthogonal Zernike bases is constructed to encode shape geometry and pose either independently or jointly, followed by a frequency-propagation operation to introduce high-frequency content into the encoding. We demonstrate the theoretical validity, efficiency, discriminability, and applicability of XShapeEnc via extensive analysis and experiments across a wide range of shape-aware tasks and our self-curated XShapeCorpus. We envision XShapeEnc as a foundational tool for research that goes beyond one-dimensional sequential data toward frontier 2D spatial intelligence.

Problem

Research questions and friction points this paper is trying to address.

positional encoding

geometric shape

spatial grounding

2D shape representation

training-free encoding

Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free encoding

spatially grounded shapes

Zernike basis