🤖 AI Summary
Conventional similarity measures for molecular structure classification inadequately capture higher-order topological features, limiting discriminative power and metric rigor. Method: This paper proposes a cohomology-based Gromov–Hausdorff ultrametric framework: molecules are modeled as simplicial complexes; their cohomological vector spaces—encoding cycles, cavities, and other high-dimensional topological invariants—are extracted; and a structurally grounded distance satisfying the ultrametric axioms is rigorously constructed. Contribution/Results: Our approach is the first to integrate cohomological invariants within the Gromov–Hausdorff geometric framework, overcoming inherent limitations of persistent homology in both discriminability and metric fidelity. Experiments on an organic–inorganic halide perovskite dataset demonstrate significant improvements in clustering accuracy and topological interpretability. The method establishes a novel paradigm for molecular representation that unifies geometric precision with deep topological expressivity.
📝 Abstract
We introduce, for the first time, a cohomology-based Gromov-Hausdorff ultrametric method to analyze 1-dimensional and higher-dimensional (co)homology groups, focusing on loops, voids, and higher-dimensional cavity structures in simplicial complexes, to address typical clustering questions arising in molecular data analysis. The Gromov-Hausdorff distance quantifies the dissimilarity between two metric spaces. In this framework, molecules are represented as simplicial complexes, and their cohomology vector spaces are computed to capture intrinsic topological invariants encoding loop and cavity structures. These vector spaces are equipped with a suitable distance measure, enabling the computation of the Gromov-Hausdorff ultrametric to evaluate structural dissimilarities. We demonstrate the methodology using organic-inorganic halide perovskite (OIHP) structures. The results highlight the effectiveness of this approach in clustering various molecular structures. By incorporating geometric information, our method provides deeper insights compared to traditional persistent homology techniques.