🤖 AI Summary
Neural networks often struggle to generalize across semantically equivalent but syntactically distinct unseen tokens, such as interchangeable bound variables. To address this limitation, this work proposes a novel Transformer architecture that employs parallel embedding streams to disentangle representations of interchangeable symbols and introduces an aggregation attention mechanism to enable structured information sharing across streams. This design is the first to rigorously achieve invariance under symbol renaming. The approach supports semantic generalization in open-vocabulary settings and significantly outperforms existing models on tasks requiring generalization to novel symbols, offering both strong theoretical guarantees and empirical effectiveness.
📝 Abstract
Current neural architectures lack a principled way to handle interchangeable tokens, i.e., symbols that are semantically equivalent yet distinguishable, such as bound variables. As a result, models trained on fixed vocabularies often struggle to generalize to unseen symbols, even when the underlying semantics remain unchanged. We propose a novel Transformer-based mechanism that is provably invariant to the renaming of interchangeable tokens. Our approach employs parallel embedding streams to isolate the contribution of each interchangeable token in the input, combined with an aggregated attention mechanism that enables structured information sharing across streams. Experimental results confirm the theoretical guarantees of our method and demonstrate substantial performance gains on open-vocabulary tasks that require generalization to novel symbols.