🤖 AI Summary
Existing theoretical analyses of Dense Associative Memory (DAM) models assume feature independence, yet real-world data exhibit pervasive feature correlations—whose impact on DAM storage capacity remains unquantified.
Method: We propose a Hamming-distance-based synthetic data generation framework that enables precise control over both feature correlation strength and pattern separability. Combining binary search with information-theoretic capacity estimation, we empirically measure DAM storage capacity across varying correlation levels and energy function orders.
Contribution/Results: We find that memory capacity grows exponentially with pattern separability but is significantly suppressed by feature correlation; moreover, this suppression intensifies with higher-order energy functions. This work provides the first quantitative characterization of how feature correlation dynamically constrains DAM capacity, revealing inherent limitations in modeling high-order interactions. Our findings establish a theoretical foundation—and empirical validation—for preprocessing strategies such as feature centering and decorrelation in DAM-based learning systems.
📝 Abstract
We investigate how feature correlations influence the capacity of Dense Associative Memory (DAM), a Transformer attention-like model. Practical machine learning scenarios involve feature-correlated data and learn representations in the input space, but current capacity analyses do not account for this. We develop an empirical framework to analyze the effects of data structure on capacity dynamics. Specifically, we systematically construct datasets that vary in feature correlation and pattern separation using Hamming distance from information theory, and compute the model's corresponding storage capacity using a simple binary search algorithm. Our experiments confirm that memory capacity scales exponentially with increasing separation in the input space. Feature correlations do not alter this relationship fundamentally, but reduce capacity slightly at constant separation. This effect is amplified at higher polynomial degrees in the energy function, suggesting that Associative Memory is more limited in depicting higher-order interactions between features than patterns. Our findings bridge theoretical work and practical settings for DAM, and might inspire more data-centric methods.