π€ AI Summary
Euclidean space struggles to capture the nonlinear hierarchical structure inherent in biological sequences, limiting performance in sequence classification and similarity measurement. To address this, we propose a hyperbolic representation framework for genomic sequences based on the PoincarΓ© ball model. Our method employs a learnable hypersurface feature mapping to embed discrete sequences into continuous hyperbolic space, preserving their intrinsic tree-like or hierarchical topology while achieving substantial dimensionality reduction. We further introduce a hyperbolic inner-product-based kernel matrix to enable efficient and geometrically consistent pairwise sequence similarity modeling. Experiments across multiple benchmark datasets demonstrate that our approach achieves an average 5.2% improvement in classification accuracy over Euclidean baselines and outperforms existing hyperbolic embedding methods in capturing biologically meaningful sequence correlations. This work establishes a theoretically grounded and practically effective paradigm for biological sequence analysis.
π Abstract
Genomic sequence analysis plays a crucial role in various scientific and medical domains. Traditional machine-learning approaches often struggle to capture the complex relationships and hierarchical structures of sequence data when working in high-dimensional Euclidean spaces. This limitation hinders accurate sequence classification and similarity measurement. To address these challenges, this research proposes a method to transform the feature representation of biological sequences into the hyperboloid space. By applying a transformation, the sequences are mapped onto the hyperboloid, preserving their inherent structural information. Once the sequences are represented in the hyperboloid space, a kernel matrix is computed based on the hyperboloid features. The kernel matrix captures the pairwise similarities between sequences, enabling more effective analysis of biological sequence relationships. This approach leverages the inner product of the hyperboloid feature vectors to measure the similarity between pairs of sequences. The experimental evaluation of the proposed approach demonstrates its efficacy in capturing important sequence correlations and improving classification accuracy.