🤖 AI Summary
Standard Transformers lack explicit mechanisms for modeling and routing relational structures among objects, limiting their performance on relational reasoning tasks. To address this, we propose the Dual Attention Transformer (DAT), the first Transformer architecture to explicitly decouple and jointly model sensory information (i.e., object attributes) and relational information (i.e., structured inter-object associations). DAT introduces two complementary attention mechanisms: sensory attention, which focuses on local attribute representations, and relational attention, a novel module designed to explicitly capture, aggregate, and propagate structured relational dependencies. We employ multi-task joint training and a unified evaluation framework across synthetic relational reasoning benchmarks as well as real-world language modeling and vision tasks. Experiments demonstrate substantial improvements in both data and parameter efficiency. Our results validate the effectiveness, generalizability, and architectural versatility of explicit relational modeling in Transformer-based architectures.
📝 Abstract
Relational reasoning is a central component of generally intelligent systems, enabling robust and data-efficient inductive generalization. Recent empirical evidence shows that many existing neural architectures, including Transformers, struggle with tasks requiring relational reasoning. In this work, we distinguish between two types of information: sensory information about the properties of individual objects, and relational information about the relationships between objects. While neural attention provides a powerful mechanism for controlling the flow of sensory information between objects, the Transformer lacks an explicit computational mechanism for routing and processing relational information. To address this limitation, we propose an architectural extension of the Transformer framework that we call the Dual Attention Transformer (DAT), featuring two distinct attention mechanisms: sensory attention for directing the flow of sensory information, and a novel relational attention mechanism for directing the flow of relational information. We empirically evaluate DAT on a diverse set of tasks ranging from synthetic relational benchmarks to complex real-world tasks such as language modeling and visual processing. Our results demonstrate that integrating explicit relational computational mechanisms into the Transformer architecture leads to significant performance gains in terms of data efficiency and parameter efficiency.